sjero.net Git - wget/blob - doc/wget.info-1

   1 This is Info file wget.info, produced by Makeinfo version 1.68 from the
   2 input file ./wget.texi.
   3
   4 INFO-DIR-SECTION Net Utilities
   5 INFO-DIR-SECTION World Wide Web
   6 START-INFO-DIR-ENTRY
   7 * Wget: (wget).         The non-interactive network downloader.
   8 END-INFO-DIR-ENTRY
   9
  10    This file documents the the GNU Wget utility for downloading network
  11 data.
  12
  13    Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
  14
  15    Permission is granted to make and distribute verbatim copies of this
  16 manual provided the copyright notice and this permission notice are
  17 preserved on all copies.
  18
  19    Permission is granted to copy and distribute modified versions of
  20 this manual under the conditions for verbatim copying, provided also
  21 that the sections entitled "Copying" and "GNU General Public License"
  22 are included exactly as in the original, and provided that the entire
  23 resulting derived work is distributed under the terms of a permission
  24 notice identical to this one.
  25
  26 \1f
  27 File: wget.info,  Node: Top,  Next: Overview,  Prev: (dir),  Up: (dir)
  28
  29 Wget 1.5.3+dev
  30 **************
  31
  32    This manual documents version 1.5.3+dev of GNU Wget, the freely
  33 available utility for network download.
  34
  35    Copyright (C) 1996, 1997, 1998 Free Software Foundation, Inc.
  36
  37 * Menu:
  38
  39 * Overview::            Features of Wget.
  40 * Invoking::            Wget command-line arguments.
  41 * Recursive Retrieval:: Description of recursive retrieval.
  42 * Following Links::     The available methods of chasing links.
  43 * Time-Stamping::       Mirroring according to time-stamps.
  44 * Startup File::        Wget's initialization file.
  45 * Examples::            Examples of usage.
  46 * Various::             The stuff that doesn't fit anywhere else.
  47 * Appendices::          Some useful references.
  48 * Copying::             You may give out copies of Wget.
  49 * Concept Index::       Topics covered by this manual.
  50
  51 \1f
  52 File: wget.info,  Node: Overview,  Next: Invoking,  Prev: Top,  Up: Top
  53
  54 Overview
  55 ********
  56
  57    GNU Wget is a freely available network utility to retrieve files from
  58 the World Wide Web, using HTTP (Hyper Text Transfer Protocol) and FTP
  59 (File Transfer Protocol), the two most widely used Internet protocols.
  60 It has many useful features to make downloading easier, some of them
  61 being:
  62
  63    * Wget is non-interactive, meaning that it can work in the
  64      background, while the user is not logged on.  This allows you to
  65      start a retrieval and disconnect from the system, letting Wget
  66      finish the work.  By contrast, most of the Web browsers require
  67      constant user's presence, which can be a great hindrance when
  68      transferring a lot of data.
  69
  70    * Wget is capable of descending recursively through the structure of
  71      HTML documents and FTP directory trees, making a local copy of the
  72      directory hierarchy similar to the one on the remote server.  This
  73      feature can be used to mirror archives and home pages, or traverse
  74      the web in search of data, like a WWW robot (*Note Robots::).  In
  75      that spirit, Wget understands the `norobots' convention.
  76
  77    * File name wildcard matching and recursive mirroring of directories
  78      are available when retrieving via FTP.  Wget can read the
  79      time-stamp information given by both HTTP and FTP servers, and
  80      store it locally.  Thus Wget can see if the remote file has
  81      changed since last retrieval, and automatically retrieve the new
  82      version if it has.  This makes Wget suitable for mirroring of FTP
  83      sites, as well as home pages.
  84
  85    * Wget works exceedingly well on slow or unstable connections,
  86      retrying the document until it is fully retrieved, or until a
  87      user-specified retry count is surpassed.  It will try to resume the
  88      download from the point of interruption, using `REST' with FTP and
  89      `Range' with HTTP servers that support them.
  90
  91    * By default, Wget supports proxy servers, which can lighten the
  92      network load, speed up retrieval and provide access behind
  93      firewalls.  However, if you are behind a firewall that requires
  94      that you use a socks style gateway, you can get the socks library
  95      and build wget with support for socks.  Wget also supports the
  96      passive FTP downloading as an option.
  97
  98    * Builtin features offer mechanisms to tune which links you wish to
  99      follow (*Note Following Links::).
 100
 101    * The retrieval is conveniently traced with printing dots, each dot
 102      representing a fixed amount of data received (1KB by default).
 103      These representations can be customized to your preferences.
 104
 105    * Most of the features are fully configurable, either through
 106      command line options, or via the initialization file `.wgetrc'
 107      (*Note Startup File::).  Wget allows you to define "global"
 108      startup files (`/usr/local/etc/wgetrc' by default) for site
 109      settings.
 110
 111    * Finally, GNU Wget is free software.  This means that everyone may
 112      use it, redistribute it and/or modify it under the terms of the
 113      GNU General Public License, as published by the Free Software
 114      Foundation (*Note Copying::).
 115
 116 \1f
 117 File: wget.info,  Node: Invoking,  Next: Recursive Retrieval,  Prev: Overview,  Up: Top
 118
 119 Invoking
 120 ********
 121
 122    By default, Wget is very simple to invoke.  The basic syntax is:
 123
 124      wget [OPTION]... [URL]...
 125
 126    Wget will simply download all the URLs specified on the command
 127 line.  URL is a "Uniform Resource Locator", as defined below.
 128
 129    However, you may wish to change some of the default parameters of
 130 Wget.  You can do it two ways: permanently, adding the appropriate
 131 command to `.wgetrc' (*Note Startup File::), or specifying it on the
 132 command line.
 133
 134 * Menu:
 135
 136 * URL Format::
 137 * Option Syntax::
 138 * Basic Startup Options::
 139 * Logging and Input File Options::
 140 * Download Options::
 141 * Directory Options::
 142 * HTTP Options::
 143 * FTP Options::
 144 * Recursive Retrieval Options::
 145 * Recursive Accept/Reject Options::
 146
 147 \1f
 148 File: wget.info,  Node: URL Format,  Next: Option Syntax,  Prev: Invoking,  Up: Invoking
 149
 150 URL Format
 151 ==========
 152
 153    "URL" is an acronym for Uniform Resource Locator.  A uniform
 154 resource locator is a compact string representation for a resource
 155 available via the Internet.  Wget recognizes the URL syntax as per
 156 RFC1738.  This is the most widely used form (square brackets denote
 157 optional parts):
 158
 159      http://host[:port]/directory/file
 160      ftp://host[:port]/directory/file
 161
 162    You can also encode your username and password within a URL:
 163
 164      ftp://user:password@host/path
 165      http://user:password@host/path
 166
 167    Either USER or PASSWORD, or both, may be left out.  If you leave out
 168 either the HTTP username or password, no authentication will be sent.
 169 If you leave out the FTP username, `anonymous' will be used.  If you
 170 leave out the FTP password, your email address will be supplied as a
 171 default password.(1)
 172
 173    You can encode unsafe characters in a URL as `%xy', `xy' being the
 174 hexadecimal representation of the character's ASCII value.  Some common
 175 unsafe characters include `%' (quoted as `%25'), `:' (quoted as `%3A'),
 176 and `@' (quoted as `%40').  Refer to RFC1738 for a comprehensive list
 177 of unsafe characters.
 178
 179    Wget also supports the `type' feature for FTP URLs.  By default, FTP
 180 documents are retrieved in the binary mode (type `i'), which means that
 181 they are downloaded unchanged.  Another useful mode is the `a'
 182 ("ASCII") mode, which converts the line delimiters between the
 183 different operating systems, and is thus useful for text files.  Here
 184 is an example:
 185
 186      ftp://host/directory/file;type=a
 187
 188    Two alternative variants of URL specification are also supported,
 189 because of historical (hysterical?) reasons and their widespreaded use.
 190
 191    FTP-only syntax (supported by `NcFTP'):
 192      host:/dir/file
 193
 194    HTTP-only syntax (introduced by `Netscape'):
 195      host[:port]/dir/file
 196
 197    These two alternative forms are deprecated, and may cease being
 198 supported in the future.
 199
 200    If you do not understand the difference between these notations, or
 201 do not know which one to use, just use the plain ordinary format you use
 202 with your favorite browser, like `Lynx' or `Netscape'.
 203
 204    ---------- Footnotes ----------
 205
 206    (1) If you have a `.netrc' file in your home directory, password
 207 will also be searched for there.
 208
 209 \1f
 210 File: wget.info,  Node: Option Syntax,  Next: Basic Startup Options,  Prev: URL Format,  Up: Invoking
 211
 212 Option Syntax
 213 =============
 214
 215    Since Wget uses GNU getopts to process its arguments, every option
 216 has a short form and a long form.  Long options are more convenient to
 217 remember, but take time to type.  You may freely mix different option
 218 styles, or specify options after the command-line arguments. Thus you
 219 may write:
 220
 221      wget -r --tries=10 http://fly.cc.fer.hr/ -o log
 222
 223    The space between the option accepting an argument and the argument
 224 may be omitted.  Instead `-o log' you can write `-olog'.
 225
 226    You may put several options that do not require arguments together,
 227 like:
 228
 229      wget -drc URL
 230
 231    This is a complete equivalent of:
 232
 233      wget -d -r -c URL
 234
 235    Since the options can be specified after the arguments, you may
 236 terminate them with `--'.  So the following will try to download URL
 237 `-x', reporting failure to `log':
 238
 239      wget -o log -- -x
 240
 241    The options that accept comma-separated lists all respect the
 242 convention that specifying an empty list clears its value.  This can be
 243 useful to clear the `.wgetrc' settings.  For instance, if your `.wgetrc'
 244 sets `exclude_directories' to `/cgi-bin', the following example will
 245 first reset it, and then set it to exclude `/~nobody' and `/~somebody'.
 246 You can also clear the lists in `.wgetrc' (*Note Wgetrc Syntax::).
 247
 248      wget -X '' -X /~nobody,/~somebody
 249
 250 \1f
 251 File: wget.info,  Node: Basic Startup Options,  Next: Logging and Input File Options,  Prev: Option Syntax,  Up: Invoking
 252
 253 Basic Startup Options
 254 =====================
 255
 256 `-V'
 257 `--version'
 258      Display the version of Wget.
 259
 260 `-h'
 261 `--help'
 262      Print a help message describing all of Wget's command-line options.
 263
 264 `-b'
 265 `--background'
 266      Go to background immediately after startup.  If no output file is
 267      specified via the `-o', output is redirected to `wget-log'.
 268
 269 `-e COMMAND'
 270 `--execute COMMAND'
 271      Execute COMMAND as if it were a part of `.wgetrc' (*Note Startup
 272      File::).  A command thus invoked will be executed *after* the
 273      commands in `.wgetrc', thus taking precedence over them.
 274
 275 \1f
 276 File: wget.info,  Node: Logging and Input File Options,  Next: Download Options,  Prev: Basic Startup Options,  Up: Invoking
 277
 278 Logging and Input File Options
 279 ==============================
 280
 281 `-o LOGFILE'
 282 `--output-file=LOGFILE'
 283      Log all messages to LOGFILE.  The messages are normally reported
 284      to standard error.
 285
 286 `-a LOGFILE'
 287 `--append-output=LOGFILE'
 288      Append to LOGFILE.  This is the same as `-o', only it appends to
 289      LOGFILE instead of overwriting the old log file.  If LOGFILE does
 290      not exist, a new file is created.
 291
 292 `-d'
 293 `--debug'
 294      Turn on debug output, meaning various information important to the
 295      developers of Wget if it does not work properly.  Your system
 296      administrator may have chosen to compile Wget without debug
 297      support, in which case `-d' will not work.  Please note that
 298      compiling with debug support is always safe--Wget compiled with
 299      the debug support will *not* print any debug info unless requested
 300      with `-d'.  *Note Reporting Bugs:: for more information on how to
 301      use `-d' for sending bug reports.
 302
 303 `-q'
 304 `--quiet'
 305      Turn off Wget's output.
 306
 307 `-v'
 308 `--verbose'
 309      Turn on verbose output, with all the available data.  The default
 310      output is verbose.
 311
 312 `-nv'
 313 `--non-verbose'
 314      Non-verbose output--turn off verbose without being completely quiet
 315      (use `-q' for that), which means that error messages and basic
 316      information still get printed.
 317
 318 `-i FILE'
 319 `--input-file=FILE'
 320      Read URLs from FILE, in which case no URLs need to be on the
 321      command line.  If there are URLs both on the command line and in
 322      an input file, those on the command lines will be the first ones to
 323      be retrieved.  The FILE need not be an HTML document (but no harm
 324      if it is)--it is enough if the URLs are just listed sequentially.
 325
 326      However, if you specify `--force-html', the document will be
 327      regarded as `html'.  In that case you may have problems with
 328      relative links, which you can solve either by adding `<base
 329      href="URL">' to the documents or by specifying `--base=URL' on the
 330      command line.
 331
 332 `-F'
 333 `--force-html'
 334      When input is read from a file, force it to be treated as an HTML
 335      file.  This enables you to retrieve relative links from existing
 336      HTML files on your local disk, by adding `<base href="URL">' to
 337      HTML, or using the `--base' command-line option.
 338
 339 \1f
 340 File: wget.info,  Node: Download Options,  Next: Directory Options,  Prev: Logging and Input File Options,  Up: Invoking
 341
 342 Download Options
 343 ================
 344
 345 `-t NUMBER'
 346 `--tries=NUMBER'
 347      Set number of retries to NUMBER.  Specify 0 or `inf' for infinite
 348      retrying.
 349
 350 `-O FILE'
 351 `--output-document=FILE'
 352      The documents will not be written to the appropriate files, but
 353      all will be concatenated together and written to FILE.  If FILE
 354      already exists, it will be overwritten.  If the FILE is `-', the
 355      documents will be written to standard output.  Including this
 356      option automatically sets the number of tries to 1.
 357
 358 `-nc'
 359 `--no-clobber'
 360      Do not clobber existing files when saving to directory hierarchy
 361      within recursive retrieval of several files. This option is
 362      *extremely* useful when you wish to continue where you left off
 363      with retrieval of many files.  If the files have the `.html' or
 364      (yuck) `.htm' suffix, they will be loaded from the local disk, and
 365      parsed as if they have been retrieved from the Web.
 366
 367 `-c'
 368 `--continue'
 369      Continue getting an existing file.  This is useful when you want to
 370      finish up the download started by another program, or a previous
 371      instance of Wget.  Thus you can write:
 372
 373           wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
 374
 375      If there is a file name `ls-lR.Z' in the current directory, Wget
 376      will assume that it is the first portion of the remote file, and
 377      will require the server to continue the retrieval from an offset
 378      equal to the length of the local file.
 379
 380      Note that you need not specify this option if all you want is Wget
 381      to continue retrieving where it left off when the connection is
 382      lost--Wget does this by default.  You need this option only when
 383      you want to continue retrieval of a file already halfway
 384      retrieved, saved by another FTP client, or left by Wget being
 385      killed.
 386
 387      Without `-c', the previous example would just begin to download the
 388      remote file to `ls-lR.Z.1'.  The `-c' option is also applicable
 389      for HTTP servers that support the `Range' header.
 390
 391 `--dot-style=STYLE'
 392      Set the retrieval style to STYLE.  Wget traces the retrieval of
 393      each document by printing dots on the screen, each dot
 394      representing a fixed amount of retrieved data.  Any number of dots
 395      may be separated in a "cluster", to make counting easier.  This
 396      option allows you to choose one of the pre-defined styles,
 397      determining the number of bytes represented by a dot, the number
 398      of dots in a cluster, and the number of dots on the line.
 399
 400      With the `default' style each dot represents 1K, there are ten dots
 401      in a cluster and 50 dots in a line.  The `binary' style has a more
 402      "computer"-like orientation--8K dots, 16-dots clusters and 48 dots
 403      per line (which makes for 384K lines).  The `mega' style is
 404      suitable for downloading very large files--each dot represents 64K
 405      retrieved, there are eight dots in a cluster, and 48 dots on each
 406      line (so each line contains 3M).  The `micro' style is exactly the
 407      reverse; it is suitable for downloading small files, with 128-byte
 408      dots, 8 dots per cluster, and 48 dots (6K) per line.
 409
 410 `-N'
 411 `--timestamping'
 412      Turn on time-stamping.  *Note Time-Stamping:: for details.
 413
 414 `-S'
 415 `--server-response'
 416      Print the headers sent by HTTP servers and responses sent by FTP
 417      servers.
 418
 419 `--spider'
 420      When invoked with this option, Wget will behave as a Web "spider",
 421      which means that it will not download the pages, just check that
 422      they are there.  You can use it to check your bookmarks, e.g. with:
 423
 424           wget --spider --force-html -i bookmarks.html
 425
 426      This feature needs much more work for Wget to get close to the
 427      functionality of real WWW spiders.
 428
 429 `-T seconds'
 430 `--timeout=SECONDS'
 431      Set the read timeout to SECONDS seconds.  Whenever a network read
 432      is issued, the file descriptor is checked for a timeout, which
 433      could otherwise leave a pending connection (uninterrupted read).
 434      The default timeout is 900 seconds (fifteen minutes).  Setting
 435      timeout to 0 will disable checking for timeouts.
 436
 437      Please do not lower the default timeout value with this option
 438      unless you know what you are doing.
 439
 440 `-w SECONDS'
 441 `--wait=SECONDS'
 442      Wait the specified number of seconds between the retrievals.  Use
 443      of this option is recommended, as it lightens the server load by
 444      making the requests less frequent.  Instead of in seconds, the
 445      time can be specified in minutes using the `m' suffix, in hours
 446      using `h' suffix, or in days using `d' suffix.
 447
 448      Specifying a large value for this option is useful if the network
 449      or the destination host is down, so that Wget can wait long enough
 450      to reasonably expect the network error to be fixed before the
 451      retry.
 452
 453 `--waitretry=SECONDS'
 454      If you don't want Wget to wait between *every* retrieval, but only
 455      between retries of failed downloads, you can use this option.  If
 456      you want to make sure you never "hammer" remote sites with rapid
 457      retries, you can leave it set all the time to some non-zero value
 458      using the waitretry variable in your `.wgetrc' file.
 459
 460 `-Y on/off'
 461 `--proxy=on/off'
 462      Turn proxy support on or off. The proxy is on by default if the
 463      appropriate environmental variable is defined.
 464
 465 `-Q QUOTA'
 466 `--quota=QUOTA'
 467      Specify download quota for automatic retrievals.  The value can be
 468      specified in bytes (default), kilobytes (with `k' suffix), or
 469      megabytes (with `m' suffix).
 470
 471      Note that quota will never affect downloading a single file.  So
 472      if you specify `wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz',
 473      all of the `ls-lR.gz' will be downloaded.  The same goes even when
 474      several URLs are specified on the command-line.  However, quota is
 475      respected when retrieving either recursively, or from an input
 476      file.  Thus you may safely type `wget -Q2m -i sites'--download
 477      will be aborted when the quota is exceeded.
 478
 479      Setting quota to 0 or to `inf' unlimits the download quota.
 480
 481 \1f
 482 File: wget.info,  Node: Directory Options,  Next: HTTP Options,  Prev: Download Options,  Up: Invoking
 483
 484 Directory Options
 485 =================
 486
 487 `-nd'
 488 `--no-directories'
 489      Do not create a hierarchy of directories when retrieving
 490      recursively. With this option turned on, all files will get saved
 491      to the current directory, without clobbering (if a name shows up
 492      more than once, the filenames will get extensions `.n').
 493
 494 `-x'
 495 `--force-directories'
 496      The opposite of `-nd'--create a hierarchy of directories, even if
 497      one would not have been created otherwise.  E.g. `wget -x
 498      http://fly.cc.fer.hr/robots.txt' will save the downloaded file to
 499      `fly.cc.fer.hr/robots.txt'.
 500
 501 `-nH'
 502 `--no-host-directories'
 503      Disable generation of host-prefixed directories.  By default,
 504      invoking Wget with `-r http://fly.cc.fer.hr/' will create a
 505      structure of directories beginning with `fly.cc.fer.hr/'.  This
 506      option disables such behavior.
 507
 508 `--cut-dirs=NUMBER'
 509      Ignore NUMBER directory components.  This is useful for getting a
 510      fine-grained control over the directory where recursive retrieval
 511      will be saved.
 512
 513      Take, for example, the directory at
 514      `ftp://ftp.xemacs.org/pub/xemacs/'.  If you retrieve it with `-r',
 515      it will be saved locally under `ftp.xemacs.org/pub/xemacs/'.
 516      While the `-nH' option can remove the `ftp.xemacs.org/' part, you
 517      are still stuck with `pub/xemacs'.  This is where `--cut-dirs'
 518      comes in handy; it makes Wget not "see" NUMBER remote directory
 519      components.  Here are several examples of how `--cut-dirs' option
 520      works.
 521
 522           No options        -> ftp.xemacs.org/pub/xemacs/
 523           -nH               -> pub/xemacs/
 524           -nH --cut-dirs=1  -> xemacs/
 525           -nH --cut-dirs=2  -> .
 526
 527           --cut-dirs=1      -> ftp.xemacs.org/xemacs/
 528           ...
 529
 530      If you just want to get rid of the directory structure, this
 531      option is similar to a combination of `-nd' and `-P'.  However,
 532      unlike `-nd', `--cut-dirs' does not lose with subdirectories--for
 533      instance, with `-nH --cut-dirs=1', a `beta/' subdirectory will be
 534      placed to `xemacs/beta', as one would expect.
 535
 536 `-P PREFIX'
 537 `--directory-prefix=PREFIX'
 538      Set directory prefix to PREFIX.  The "directory prefix" is the
 539      directory where all other files and subdirectories will be saved
 540      to, i.e. the top of the retrieval tree.  The default is `.' (the
 541      current directory).
 542
 543 \1f
 544 File: wget.info,  Node: HTTP Options,  Next: FTP Options,  Prev: Directory Options,  Up: Invoking
 545
 546 HTTP Options
 547 ============
 548
 549 `--http-user=USER'
 550 `--http-passwd=PASSWORD'
 551      Specify the username USER and password PASSWORD on an HTTP server.
 552      According to the type of the challenge, Wget will encode them
 553      using either the `basic' (insecure) or the `digest' authentication
 554      scheme.
 555
 556      Another way to specify username and password is in the URL itself
 557      (*Note URL Format::).  For more information about security issues
 558      with Wget, *Note Security Considerations::.
 559
 560 `-C on/off'
 561 `--cache=on/off'
 562      When set to off, disable server-side cache.  In this case, Wget
 563      will send the remote server an appropriate directive (`Pragma:
 564      no-cache') to get the file from the remote service, rather than
 565      returning the cached version.  This is especially useful for
 566      retrieving and flushing out-of-date documents on proxy servers.
 567
 568      Caching is allowed by default.
 569
 570 `--ignore-length'
 571      Unfortunately, some HTTP servers (CGI programs, to be more
 572      precise) send out bogus `Content-Length' headers, which makes Wget
 573      go wild, as it thinks not all the document was retrieved.  You can
 574      spot this syndrome if Wget retries getting the same document again
 575      and again, each time claiming that the (otherwise normal)
 576      connection has closed on the very same byte.
 577
 578      With this option, Wget will ignore the `Content-Length' header--as
 579      if it never existed.
 580
 581 `--header=ADDITIONAL-HEADER'
 582      Define an ADDITIONAL-HEADER to be passed to the HTTP servers.
 583      Headers must contain a `:' preceded by one or more non-blank
 584      characters, and must not contain newlines.
 585
 586      You may define more than one additional header by specifying
 587      `--header' more than once.
 588
 589           wget --header='Accept-Charset: iso-8859-2' \
 590                --header='Accept-Language: hr'        \
 591                  http://fly.cc.fer.hr/
 592
 593      Specification of an empty string as the header value will clear all
 594      previous user-defined headers.
 595
 596 `--proxy-user=USER'
 597 `--proxy-passwd=PASSWORD'
 598      Specify the username USER and password PASSWORD for authentication
 599      on a proxy server.  Wget will encode them using the `basic'
 600      authentication scheme.
 601
 602 `-s'
 603 `--save-headers'
 604      Save the headers sent by the HTTP server to the file, preceding the
 605      actual contents, with an empty line as the separator.
 606
 607 `-U AGENT-STRING'
 608 `--user-agent=AGENT-STRING'
 609      Identify as AGENT-STRING to the HTTP server.
 610
 611      The HTTP protocol allows the clients to identify themselves using a
 612      `User-Agent' header field.  This enables distinguishing the WWW
 613      software, usually for statistical purposes or for tracing of
 614      protocol violations.  Wget normally identifies as `Wget/VERSION',
 615      VERSION being the current version number of Wget.
 616
 617      However, some sites have been known to impose the policy of
 618      tailoring the output according to the `User-Agent'-supplied
 619      information.  While conceptually this is not such a bad idea, it
 620      has been abused by servers denying information to clients other
 621      than `Mozilla' or Microsoft `Internet Explorer'.  This option
 622      allows you to change the `User-Agent' line issued by Wget.  Use of
 623      this option is discouraged, unless you really know what you are
 624      doing.
 625
 626      *NOTE* that Netscape Communications Corp. has claimed that false
 627      transmissions of `Mozilla' as the `User-Agent' are a copyright
 628      infringement, which will be prosecuted.  *DO NOT* misrepresent
 629      Wget as Mozilla.
 630
 631 \1f
 632 File: wget.info,  Node: FTP Options,  Next: Recursive Retrieval Options,  Prev: HTTP Options,  Up: Invoking
 633
 634 FTP Options
 635 ===========
 636
 637 `--retr-symlinks'
 638      Retrieve symbolic links on FTP sites as if they were plain files,
 639      i.e. don't just create links locally.
 640
 641 `-g on/off'
 642 `--glob=on/off'
 643      Turn FTP globbing on or off.  Globbing means you may use the
 644      shell-like special characters ("wildcards"), like `*', `?', `['
 645      and `]' to retrieve more than one file from the same directory at
 646      once, like:
 647
 648           wget ftp://gnjilux.cc.fer.hr/*.msg
 649
 650      By default, globbing will be turned on if the URL contains a
 651      globbing character.  This option may be used to turn globbing on
 652      or off permanently.
 653
 654      You may have to quote the URL to protect it from being expanded by
 655      your shell.  Globbing makes Wget look for a directory listing,
 656      which is system-specific.  This is why it currently works only
 657      with Unix FTP servers (and the ones emulating Unix `ls' output).
 658
 659 `--passive-ftp'
 660      Use the "passive" FTP retrieval scheme, in which the client
 661      initiates the data connection.  This is sometimes required for FTP
 662      to work behind firewalls.
 663
 664 \1f
 665 File: wget.info,  Node: Recursive Retrieval Options,  Next: Recursive Accept/Reject Options,  Prev: FTP Options,  Up: Invoking
 666
 667 Recursive Retrieval Options
 668 ===========================
 669
 670 `-r'
 671 `--recursive'
 672      Turn on recursive retrieving.  *Note Recursive Retrieval:: for more
 673      details.
 674
 675 `-l DEPTH'
 676 `--level=DEPTH'
 677      Specify recursion maximum depth level DEPTH (*Note Recursive
 678      Retrieval::).  The default maximum depth is 5.
 679
 680 `--delete-after'
 681      This option tells Wget to delete every single file it downloads,
 682      *after* having done so.  It is useful for pre-fetching popular
 683      pages through proxy, e.g.:
 684
 685           wget -r -nd --delete-after http://whatever.com/~popular/page/
 686
 687      The `-r' option is to retrieve recursively, and `-nd' not to
 688      create directories.
 689
 690 `-k'
 691 `--convert-links'
 692      Convert the non-relative links to relative ones locally.  Only the
 693      references to the documents actually downloaded will be converted;
 694      the rest will be left unchanged.
 695
 696      Note that only at the end of the download can Wget know which
 697      links have been downloaded.  Because of that, much of the work
 698      done by `-k' will be performed at the end of the downloads.
 699
 700 `-K'
 701 `--backup-converted'
 702      When converting a file, back up the original version with a `.orig'
 703      suffix.  Affects the behavior of `-N' (*Note HTTP Time-Stamping
 704      Internals::).
 705
 706 `-m'
 707 `--mirror'
 708      Turn on options suitable for mirroring.  This option turns on
 709      recursion and time-stamping, sets infinite recursion depth and
 710      keeps FTP directory listings.  It is currently equivalent to `-r
 711      -N -l inf -nr'.
 712
 713 `-nr'
 714 `--dont-remove-listing'
 715      Don't remove the temporary `.listing' files generated by FTP
 716      retrievals.  Normally, these files contain the raw directory
 717      listings received from FTP servers.  Not removing them can be
 718      useful to access the full remote file list when running a mirror,
 719      or for debugging purposes.
 720
 721 \1f
 722 File: wget.info,  Node: Recursive Accept/Reject Options,  Prev: Recursive Retrieval Options,  Up: Invoking
 723
 724 Recursive Accept/Reject Options
 725 ===============================
 726
 727 `-A ACCLIST --accept ACCLIST'
 728 `-R REJLIST --reject REJLIST'
 729      Specify comma-separated lists of file name suffixes or patterns to
 730      accept or reject (*Note Types of Files:: for more details).
 731
 732 `-D DOMAIN-LIST'
 733 `--domains=DOMAIN-LIST'
 734      Set domains to be accepted and DNS looked-up, where DOMAIN-LIST is
 735      a comma-separated list.  Note that it does *not* turn on `-H'.
 736      This option speeds things up, even if only one host is spanned
 737      (*Note Domain Acceptance::).
 738
 739 `--exclude-domains DOMAIN-LIST'
 740      Exclude the domains given in a comma-separated DOMAIN-LIST from
 741      DNS-lookup (*Note Domain Acceptance::).
 742
 743 `--follow-ftp'
 744      Follow FTP links from HTML documents.  Without this option, Wget
 745      will ignore all the FTP links.
 746
 747 `--follow-tags=LIST'
 748      Wget has an internal table of HTML tag / attribute pairs that it
 749      considers when looking for linked documents during a recursive
 750      retrieval.  If a user wants only a subset of those tags to be
 751      considered, however, he or she should be specify such tags in a
 752      comma-separated LIST with this option.
 753
 754 `-G LIST'
 755 `--ignore-tags=LIST'
 756      This is the opposite of the `--follow-tags' option.  To skip
 757      certain HTML tags when recursively looking for documents to
 758      download, specify them in a comma-separated LIST.  The author of
 759      this option likes to use the following command to download a
 760      single HTML page and all documents necessary to display it
 761      properly:
 762
 763           wget -Ga,area -H -k -K -nh -r http://SITE/DOCUMENT
 764
 765 `-H'
 766 `--span-hosts'
 767      Enable spanning across hosts when doing recursive retrieving
 768      (*Note All Hosts::).
 769
 770 `-L'
 771 `--relative'
 772      Follow relative links only.  Useful for retrieving a specific home
 773      page without any distractions, not even those from the same hosts
 774      (*Note Relative Links::).
 775
 776 `-I LIST'
 777 `--include-directories=LIST'
 778      Specify a comma-separated list of directories you wish to follow
 779      when downloading (*Note Directory-Based Limits:: for more
 780      details.)  Elements of LIST may contain wildcards.
 781
 782 `-X LIST'
 783 `--exclude-directories=LIST'
 784      Specify a comma-separated list of directories you wish to exclude
 785      from download (*Note Directory-Based Limits:: for more details.)
 786      Elements of LIST may contain wildcards.
 787
 788 `-nh'
 789 `--no-host-lookup'
 790      Disable the time-consuming DNS lookup of almost all hosts (*Note
 791      Host Checking::).
 792
 793 `-np'
 794
 795 `--no-parent'
 796      Do not ever ascend to the parent directory when retrieving
 797      recursively.  This is a useful option, since it guarantees that
 798      only the files *below* a certain hierarchy will be downloaded.
 799      *Note Directory-Based Limits:: for more details.
 800
 801 \1f
 802 File: wget.info,  Node: Recursive Retrieval,  Next: Following Links,  Prev: Invoking,  Up: Top
 803
 804 Recursive Retrieval
 805 *******************
 806
 807    GNU Wget is capable of traversing parts of the Web (or a single HTTP
 808 or FTP server), depth-first following links and directory structure.
 809 This is called "recursive" retrieving, or "recursion".
 810
 811    With HTTP URLs, Wget retrieves and parses the HTML from the given
 812 URL, documents, retrieving the files the HTML document was referring
 813 to, through markups like `href', or `src'.  If the freshly downloaded
 814 file is also of type `text/html', it will be parsed and followed
 815 further.
 816
 817    The maximum "depth" to which the retrieval may descend is specified
 818 with the `-l' option (the default maximum depth is five layers).  *Note
 819 Recursive Retrieval::.
 820
 821    When retrieving an FTP URL recursively, Wget will retrieve all the
 822 data from the given directory tree (including the subdirectories up to
 823 the specified depth) on the remote server, creating its mirror image
 824 locally.  FTP retrieval is also limited by the `depth' parameter.
 825
 826    By default, Wget will create a local directory tree, corresponding to
 827 the one found on the remote server.
 828
 829    Recursive retrieving can find a number of applications, the most
 830 important of which is mirroring.  It is also useful for WWW
 831 presentations, and any other opportunities where slow network
 832 connections should be bypassed by storing the files locally.
 833
 834    You should be warned that invoking recursion may cause grave
 835 overloading on your system, because of the fast exchange of data
 836 through the network; all of this may hamper other users' work.  The
 837 same stands for the foreign server you are mirroring--the more requests
 838 it gets in a rows, the greater is its load.
 839
 840    Careless retrieving can also fill your file system uncontrollably,
 841 which can grind the machine to a halt.
 842
 843    The load can be minimized by lowering the maximum recursion level
 844 (`-l') and/or by lowering the number of retries (`-t').  You may also
 845 consider using the `-w' option to slow down your requests to the remote
 846 servers, as well as the numerous options to narrow the number of
 847 followed links (*Note Following Links::).
 848
 849    Recursive retrieval is a good thing when used properly.  Please take
 850 all precautions not to wreak havoc through carelessness.
 851
 852 \1f
 853 File: wget.info,  Node: Following Links,  Next: Time-Stamping,  Prev: Recursive Retrieval,  Up: Top
 854
 855 Following Links
 856 ***************
 857
 858    When retrieving recursively, one does not wish to retrieve loads of
 859 unnecessary data.  Most of the time the users bear in mind exactly what
 860 they want to download, and want Wget to follow only specific links.
 861
 862    For example, if you wish to download the music archive from
 863 `fly.cc.fer.hr', you will not want to download all the home pages that
 864 happen to be referenced by an obscure part of the archive.
 865
 866    Wget possesses several mechanisms that allows you to fine-tune which
 867 links it will follow.
 868
 869 * Menu:
 870
 871 * Relative Links::         Follow relative links only.
 872 * Host Checking::          Follow links on the same host.
 873 * Domain Acceptance::      Check on a list of domains.
 874 * All Hosts::              No host restrictions.
 875 * Types of Files::         Getting only certain files.
 876 * Directory-Based Limits:: Getting only certain directories.
 877 * FTP Links::              Following FTP links.
 878
 879 \1f
 880 File: wget.info,  Node: Relative Links,  Next: Host Checking,  Prev: Following Links,  Up: Following Links
 881
 882 Relative Links
 883 ==============
 884
 885    When only relative links are followed (option `-L'), recursive
 886 retrieving will never span hosts.  No time-expensive DNS-lookups will
 887 be performed, and the process will be very fast, with the minimum
 888 strain of the network.  This will suit your needs often, especially when
 889 mirroring the output of various `x2html' converters, since they
 890 generally output relative links.
 891
 892 \1f
 893 File: wget.info,  Node: Host Checking,  Next: Domain Acceptance,  Prev: Relative Links,  Up: Following Links
 894
 895 Host Checking
 896 =============
 897
 898    The drawback of following the relative links solely is that humans
 899 often tend to mix them with absolute links to the very same host, and
 900 the very same page.  In this mode (which is the default mode for
 901 following links) all URLs that refer to the same host will be retrieved.
 902
 903    The problem with this option are the aliases of the hosts and
 904 domains.  Thus there is no way for Wget to know that `regoc.srce.hr' and
 905 `www.srce.hr' are the same host, or that `fly.cc.fer.hr' is the same as
 906 `fly.cc.etf.hr'.  Whenever an absolute link is encountered, the host is
 907 DNS-looked-up with `gethostbyname' to check whether we are maybe
 908 dealing with the same hosts.  Although the results of `gethostbyname'
 909 are cached, it is still a great slowdown, e.g. when dealing with large
 910 indices of home pages on different hosts (because each of the hosts
 911 must be DNS-resolved to see whether it just *might* be an alias of the
 912 starting host).
 913
 914    To avoid the overhead you may use `-nh', which will turn off
 915 DNS-resolving and make Wget compare hosts literally.  This will make
 916 things run much faster, but also much less reliable (e.g. `www.srce.hr'
 917 and `regoc.srce.hr' will be flagged as different hosts).
 918
 919    Note that modern HTTP servers allow one IP address to host several
 920 "virtual servers", each having its own directory hierarchy.  Such
 921 "servers" are distinguished by their hostnames (all of which point to
 922 the same IP address); for this to work, a client must send a `Host'
 923 header, which is what Wget does.  However, in that case Wget *must not*
 924 try to divine a host's "real" address, nor try to use the same hostname
 925 for each access, i.e. `-nh' must be turned on.
 926
 927    In other words, the `-nh' option must be used to enable the
 928 retrieval from virtual servers distinguished by their hostnames.  As the
 929 number of such server setups grow, the behavior of `-nh' may become the
 930 default in the future.
 931
 932 \1f
 933 File: wget.info,  Node: Domain Acceptance,  Next: All Hosts,  Prev: Host Checking,  Up: Following Links
 934
 935 Domain Acceptance
 936 =================
 937
 938    With the `-D' option you may specify the domains that will be
 939 followed.  The hosts the domain of which is not in this list will not be
 940 DNS-resolved.  Thus you can specify `-Dmit.edu' just to make sure that
 941 *nothing outside of MIT gets looked up*.  This is very important and
 942 useful.  It also means that `-D' does *not* imply `-H' (span all
 943 hosts), which must be specified explicitly.  Feel free to use this
 944 options since it will speed things up, with almost all the reliability
 945 of checking for all hosts.  Thus you could invoke
 946
 947      wget -r -D.hr http://fly.cc.fer.hr/
 948
 949    to make sure that only the hosts in `.hr' domain get DNS-looked-up
 950 for being equal to `fly.cc.fer.hr'.  So `fly.cc.etf.hr' will be checked
 951 (only once!) and found equal, but `www.gnu.ai.mit.edu' will not even be
 952 checked.
 953
 954    Of course, domain acceptance can be used to limit the retrieval to
 955 particular domains with spanning of hosts in them, but then you must
 956 specify `-H' explicitly.  E.g.:
 957
 958      wget -r -H -Dmit.edu,stanford.edu http://www.mit.edu/
 959
 960    will start with `http://www.mit.edu/', following links across MIT
 961 and Stanford.
 962
 963    If there are domains you want to exclude specifically, you can do it
 964 with `--exclude-domains', which accepts the same type of arguments of
 965 `-D', but will *exclude* all the listed domains.  For example, if you
 966 want to download all the hosts from `foo.edu' domain, with the
 967 exception of `sunsite.foo.edu', you can do it like this:
 968
 969      wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu http://www.foo.edu/
 970
 971 \1f
 972 File: wget.info,  Node: All Hosts,  Next: Types of Files,  Prev: Domain Acceptance,  Up: Following Links
 973
 974 All Hosts
 975 =========
 976
 977    When `-H' is specified without `-D', all hosts are freely spanned.
 978 There are no restrictions whatsoever as to what part of the net Wget
 979 will go to fetch documents, other than maximum retrieval depth.  If a
 980 page references `www.yahoo.com', so be it.  Such an option is rarely
 981 useful for itself.
 982
 983 \1f
 984 File: wget.info,  Node: Types of Files,  Next: Directory-Based Limits,  Prev: All Hosts,  Up: Following Links
 985
 986 Types of Files
 987 ==============
 988
 989    When downloading material from the web, you will often want to
 990 restrict the retrieval to only certain file types.  For example, if you
 991 are interested in downloading GIFs, you will not be overjoyed to get
 992 loads of PostScript documents, and vice versa.
 993
 994    Wget offers two options to deal with this problem.  Each option
 995 description lists a short name, a long name, and the equivalent command
 996 in `.wgetrc'.
 997
 998 `-A ACCLIST'
 999 `--accept ACCLIST'
1000 `accept = ACCLIST'
1001      The argument to `--accept' option is a list of file suffixes or
1002      patterns that Wget will download during recursive retrieval.  A
1003      suffix is the ending part of a file, and consists of "normal"
1004      letters, e.g. `gif' or `.jpg'.  A matching pattern contains
1005      shell-like wildcards, e.g. `books*' or `zelazny*196[0-9]*'.
1006
1007      So, specifying `wget -A gif,jpg' will make Wget download only the
1008      files ending with `gif' or `jpg', i.e. GIFs and JPEGs.  On the
1009      other hand, `wget -A "zelazny*196[0-9]*"' will download only files
1010      beginning with `zelazny' and containing numbers from 1960 to 1969
1011      anywhere within.  Look up the manual of your shell for a
1012      description of how pattern matching works.
1013
1014      Of course, any number of suffixes and patterns can be combined
1015      into a comma-separated list, and given as an argument to `-A'.
1016
1017 `-R REJLIST'
1018 `--reject REJLIST'
1019 `reject = REJLIST'
1020      The `--reject' option works the same way as `--accept', only its
1021      logic is the reverse; Wget will download all files *except* the
1022      ones matching the suffixes (or patterns) in the list.
1023
1024      So, if you want to download a whole page except for the cumbersome
1025      MPEGs and .AU files, you can use `wget -R mpg,mpeg,au'.
1026      Analogously, to download all files except the ones beginning with
1027      `bjork', use `wget -R "bjork*"'.  The quotes are to prevent
1028      expansion by the shell.
1029
1030    The `-A' and `-R' options may be combined to achieve even better
1031 fine-tuning of which files to retrieve.  E.g. `wget -A "*zelazny*" -R
1032 .ps' will download all the files having `zelazny' as a part of their
1033 name, but *not* the PostScript files.
1034
1035    Note that these two options do not affect the downloading of HTML
1036 files; Wget must load all the HTMLs to know where to go at
1037 all--recursive retrieval would make no sense otherwise.
1038
1039 \1f
1040 File: wget.info,  Node: Directory-Based Limits,  Next: FTP Links,  Prev: Types of Files,  Up: Following Links
1041
1042 Directory-Based Limits
1043 ======================
1044
1045    Regardless of other link-following facilities, it is often useful to
1046 place the restriction of what files to retrieve based on the directories
1047 those files are placed in.  There can be many reasons for this--the
1048 home pages may be organized in a reasonable directory structure; or some
1049 directories may contain useless information, e.g. `/cgi-bin' or `/dev'
1050 directories.
1051
1052    Wget offers three different options to deal with this requirement.
1053 Each option description lists a short name, a long name, and the
1054 equivalent command in `.wgetrc'.
1055
1056 `-I LIST'
1057 `--include LIST'
1058 `include_directories = LIST'
1059      `-I' option accepts a comma-separated list of directories included
1060      in the retrieval.  Any other directories will simply be ignored.
1061      The directories are absolute paths.
1062
1063      So, if you wish to download from `http://host/people/bozo/'
1064      following only links to bozo's colleagues in the `/people'
1065      directory and the bogus scripts in `/cgi-bin', you can specify:
1066
1067           wget -I /people,/cgi-bin http://host/people/bozo/
1068
1069 `-X LIST'
1070 `--exclude LIST'
1071 `exclude_directories = LIST'
1072      `-X' option is exactly the reverse of `-I'--this is a list of
1073      directories *excluded* from the download.  E.g. if you do not want
1074      Wget to download things from `/cgi-bin' directory, specify `-X
1075      /cgi-bin' on the command line.
1076
1077      The same as with `-A'/`-R', these two options can be combined to
1078      get a better fine-tuning of downloading subdirectories.  E.g. if
1079      you want to load all the files from `/pub' hierarchy except for
1080      `/pub/worthless', specify `-I/pub -X/pub/worthless'.
1081
1082 `-np'
1083 `--no-parent'
1084 `no_parent = on'
1085      The simplest, and often very useful way of limiting directories is
1086      disallowing retrieval of the links that refer to the hierarchy
1087      "above" than the beginning directory, i.e. disallowing ascent to
1088      the parent directory/directories.
1089
1090      The `--no-parent' option (short `-np') is useful in this case.
1091      Using it guarantees that you will never leave the existing
1092      hierarchy.  Supposing you issue Wget with:
1093
1094           wget -r --no-parent http://somehost/~luzer/my-archive/
1095
1096      You may rest assured that none of the references to
1097      `/~his-girls-homepage/' or `/~luzer/all-my-mpegs/' will be
1098      followed.  Only the archive you are interested in will be
1099      downloaded.  Essentially, `--no-parent' is similar to
1100      `-I/~luzer/my-archive', only it handles redirections in a more
1101      intelligent fashion.
1102
1103 \1f
1104 File: wget.info,  Node: FTP Links,  Prev: Directory-Based Limits,  Up: Following Links
1105
1106 Following FTP Links
1107 ===================
1108
1109    The rules for FTP are somewhat specific, as it is necessary for them
1110 to be.  FTP links in HTML documents are often included for purposes of
1111 reference, and it is often inconvenient to download them by default.
1112
1113    To have FTP links followed from HTML documents, you need to specify
1114 the `--follow-ftp' option.  Having done that, FTP links will span hosts
1115 regardless of `-H' setting.  This is logical, as FTP links rarely point
1116 to the same host where the HTTP server resides.  For similar reasons,
1117 the `-L' options has no effect on such downloads.  On the other hand,
1118 domain acceptance (`-D') and suffix rules (`-A' and `-R') apply
1119 normally.
1120
1121    Also note that followed links to FTP directories will not be
1122 retrieved recursively further.
1123
1124 \1f
1125 File: wget.info,  Node: Time-Stamping,  Next: Startup File,  Prev: Following Links,  Up: Top
1126
1127 Time-Stamping
1128 *************
1129
1130    One of the most important aspects of mirroring information from the
1131 Internet is updating your archives.
1132
1133    Downloading the whole archive again and again, just to replace a few
1134 changed files is expensive, both in terms of wasted bandwidth and money,
1135 and the time to do the update.  This is why all the mirroring tools
1136 offer the option of incremental updating.
1137
1138    Such an updating mechanism means that the remote server is scanned in
1139 search of "new" files.  Only those new files will be downloaded in the
1140 place of the old ones.
1141
1142    A file is considered new if one of these two conditions are met:
1143
1144   1. A file of that name does not already exist locally.
1145
1146   2. A file of that name does exist, but the remote file was modified
1147      more recently than the local file.
1148
1149    To implement this, the program needs to be aware of the time of last
1150 modification of both remote and local files.  Such information are
1151 called the "time-stamps".
1152
1153    The time-stamping in GNU Wget is turned on using `--timestamping'
1154 (`-N') option, or through `timestamping = on' directive in `.wgetrc'.
1155 With this option, for each file it intends to download, Wget will check
1156 whether a local file of the same name exists.  If it does, and the
1157 remote file is older, Wget will not download it.
1158
1159    If the local file does not exist, or the sizes of the files do not
1160 match, Wget will download the remote file no matter what the time-stamps
1161 say.
1162
1163 * Menu:
1164
1165 * Time-Stamping Usage::
1166 * HTTP Time-Stamping Internals::
1167 * FTP Time-Stamping Internals::
1168
1169 \1f
1170 File: wget.info,  Node: Time-Stamping Usage,  Next: HTTP Time-Stamping Internals,  Prev: Time-Stamping,  Up: Time-Stamping
1171
1172 Time-Stamping Usage
1173 ===================
1174
1175    The usage of time-stamping is simple.  Say you would like to
1176 download a file so that it keeps its date of modification.
1177
1178      wget -S http://www.gnu.ai.mit.edu/
1179
1180    A simple `ls -l' shows that the time stamp on the local file equals
1181 the state of the `Last-Modified' header, as returned by the server.  As
1182 you can see, the time-stamping info is preserved locally, even without
1183 `-N'.
1184
1185    Several days later, you would like Wget to check if the remote file
1186 has changed, and download it if it has.
1187
1188      wget -N http://www.gnu.ai.mit.edu/
1189
1190    Wget will ask the server for the last-modified date.  If the local
1191 file is newer, the remote file will not be re-fetched.  However, if the
1192 remote file is more recent, Wget will proceed fetching it normally.
1193
1194    The same goes for FTP.  For example:
1195
1196      wget ftp://ftp.ifi.uio.no/pub/emacs/gnus/*
1197
1198    `ls' will show that the timestamps are set according to the state on
1199 the remote server.  Reissuing the command with `-N' will make Wget
1200 re-fetch *only* the files that have been modified.
1201
1202    In both HTTP and FTP retrieval Wget will time-stamp the local file
1203 correctly (with or without `-N') if it gets the stamps, i.e. gets the
1204 directory listing for FTP or the `Last-Modified' header for HTTP.
1205
1206    If you wished to mirror the GNU archive every week, you would use the
1207 following command every week:
1208
1209      wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/
1210
1211 \1f
1212 File: wget.info,  Node: HTTP Time-Stamping Internals,  Next: FTP Time-Stamping Internals,  Prev: Time-Stamping Usage,  Up: Time-Stamping
1213
1214 HTTP Time-Stamping Internals
1215 ============================
1216
1217    Time-stamping in HTTP is implemented by checking of the
1218 `Last-Modified' header.  If you wish to retrieve the file `foo.html'
1219 through HTTP, Wget will check whether `foo.html' exists locally.  If it
1220 doesn't, `foo.html' will be retrieved unconditionally.
1221
1222    If the file does exist locally, Wget will first check its local
1223 time-stamp (similar to the way `ls -l' checks it), and then send a
1224 `HEAD' request to the remote server, demanding the information on the
1225 remote file.
1226
1227    The `Last-Modified' header is examined to find which file was
1228 modified more recently (which makes it "newer").  If the remote file is
1229 newer, it will be downloaded; if it is older, Wget will give up.(1)
1230
1231    When `--backup-converted' (`-K') is specified in conjunction with
1232 `-N', server file `X' is compared to local file `X.orig', if extant,
1233 rather than being compared to local file `X', which will always differ
1234 if it's been converted by `--convert-links' (`-k').
1235
1236    Arguably, HTTP time-stamping should be implemented using the
1237 `If-Modified-Since' request.
1238
1239    ---------- Footnotes ----------
1240
1241    (1) As an additional check, Wget will look at the `Content-Length'
1242 header, and compare the sizes; if they are not the same, the remote
1243 file will be downloaded no matter what the time-stamp says.
1244
1245 \1f
1246 File: wget.info,  Node: FTP Time-Stamping Internals,  Prev: HTTP Time-Stamping Internals,  Up: Time-Stamping
1247
1248 FTP Time-Stamping Internals
1249 ===========================
1250
1251    In theory, FTP time-stamping works much the same as HTTP, only FTP
1252 has no headers--time-stamps must be received from the directory
1253 listings.
1254
1255    For each directory files must be retrieved from, Wget will use the
1256 `LIST' command to get the listing.  It will try to analyze the listing,
1257 assuming that it is a Unix `ls -l' listing, and extract the
1258 time-stamps.  The rest is exactly the same as for HTTP.
1259
1260    Assumption that every directory listing is a Unix-style listing may
1261 sound extremely constraining, but in practice it is not, as many
1262 non-Unix FTP servers use the Unixoid listing format because most (all?)
1263 of the clients understand it.  Bear in mind that RFC959 defines no
1264 standard way to get a file list, let alone the time-stamps.  We can
1265 only hope that a future standard will define this.
1266
1267    Another non-standard solution includes the use of `MDTM' command
1268 that is supported by some FTP servers (including the popular
1269 `wu-ftpd'), which returns the exact time of the specified file.  Wget
1270 may support this command in the future.
1271