resulting derived work is distributed under the terms of a permission
notice identical to this one.
+\1f
+File: wget.info, Node: Time-Stamping, Next: Startup File, Prev: Following Links, Up: Top
+
+Time-Stamping
+*************
+
+ One of the most important aspects of mirroring information from the
+Internet is updating your archives.
+
+ Downloading the whole archive again and again, just to replace a few
+changed files is expensive, both in terms of wasted bandwidth and money,
+and the time to do the update. This is why all the mirroring tools
+offer the option of incremental updating.
+
+ Such an updating mechanism means that the remote server is scanned in
+search of "new" files. Only those new files will be downloaded in the
+place of the old ones.
+
+ A file is considered new if one of these two conditions are met:
+
+ 1. A file of that name does not already exist locally.
+
+ 2. A file of that name does exist, but the remote file was modified
+ more recently than the local file.
+
+ To implement this, the program needs to be aware of the time of last
+modification of both remote and local files. Such information are
+called the "time-stamps".
+
+ The time-stamping in GNU Wget is turned on using `--timestamping'
+(`-N') option, or through `timestamping = on' directive in `.wgetrc'.
+With this option, for each file it intends to download, Wget will check
+whether a local file of the same name exists. If it does, and the
+remote file is older, Wget will not download it.
+
+ If the local file does not exist, or the sizes of the files do not
+match, Wget will download the remote file no matter what the time-stamps
+say.
+
+* Menu:
+
+* Time-Stamping Usage::
+* HTTP Time-Stamping Internals::
+* FTP Time-Stamping Internals::
+
+\1f
+File: wget.info, Node: Time-Stamping Usage, Next: HTTP Time-Stamping Internals, Prev: Time-Stamping, Up: Time-Stamping
+
+Time-Stamping Usage
+===================
+
+ The usage of time-stamping is simple. Say you would like to
+download a file so that it keeps its date of modification.
+
+ wget -S http://www.gnu.ai.mit.edu/
+
+ A simple `ls -l' shows that the time stamp on the local file equals
+the state of the `Last-Modified' header, as returned by the server. As
+you can see, the time-stamping info is preserved locally, even without
+`-N'.
+
+ Several days later, you would like Wget to check if the remote file
+has changed, and download it if it has.
+
+ wget -N http://www.gnu.ai.mit.edu/
+
+ Wget will ask the server for the last-modified date. If the local
+file is newer, the remote file will not be re-fetched. However, if the
+remote file is more recent, Wget will proceed fetching it normally.
+
+ The same goes for FTP. For example:
+
+ wget ftp://ftp.ifi.uio.no/pub/emacs/gnus/*
+
+ `ls' will show that the timestamps are set according to the state on
+the remote server. Reissuing the command with `-N' will make Wget
+re-fetch *only* the files that have been modified.
+
+ In both HTTP and FTP retrieval Wget will time-stamp the local file
+correctly (with or without `-N') if it gets the stamps, i.e. gets the
+directory listing for FTP or the `Last-Modified' header for HTTP.
+
+ If you wished to mirror the GNU archive every week, you would use the
+following command every week:
+
+ wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/
+
+\1f
+File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping
+
+HTTP Time-Stamping Internals
+============================
+
+ Time-stamping in HTTP is implemented by checking of the
+`Last-Modified' header. If you wish to retrieve the file `foo.html'
+through HTTP, Wget will check whether `foo.html' exists locally. If it
+doesn't, `foo.html' will be retrieved unconditionally.
+
+ If the file does exist locally, Wget will first check its local
+time-stamp (similar to the way `ls -l' checks it), and then send a
+`HEAD' request to the remote server, demanding the information on the
+remote file.
+
+ The `Last-Modified' header is examined to find which file was
+modified more recently (which makes it "newer"). If the remote file is
+newer, it will be downloaded; if it is older, Wget will give up.(1)
+
+ When `--backup-converted' (`-K') is specified in conjunction with
+`-N', server file `X' is compared to local file `X.orig', if extant,
+rather than being compared to local file `X', which will always differ
+if it's been converted by `--convert-links' (`-k').
+
+ Arguably, HTTP time-stamping should be implemented using the
+`If-Modified-Since' request.
+
+ ---------- Footnotes ----------
+
+ (1) As an additional check, Wget will look at the `Content-Length'
+header, and compare the sizes; if they are not the same, the remote
+file will be downloaded no matter what the time-stamp says.
+
+\1f
+File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping
+
+FTP Time-Stamping Internals
+===========================
+
+ In theory, FTP time-stamping works much the same as HTTP, only FTP
+has no headers--time-stamps must be received from the directory
+listings.
+
+ For each directory files must be retrieved from, Wget will use the
+`LIST' command to get the listing. It will try to analyze the listing,
+assuming that it is a Unix `ls -l' listing, and extract the
+time-stamps. The rest is exactly the same as for HTTP.
+
+ Assumption that every directory listing is a Unix-style listing may
+sound extremely constraining, but in practice it is not, as many
+non-Unix FTP servers use the Unixoid listing format because most (all?)
+of the clients understand it. Bear in mind that RFC959 defines no
+standard way to get a file list, let alone the time-stamps. We can
+only hope that a future standard will define this.
+
+ Another non-standard solution includes the use of `MDTM' command
+that is supported by some FTP servers (including the popular
+`wu-ftpd'), which returns the exact time of the specified file. Wget
+may support this command in the future.
+
+\1f
+File: wget.info, Node: Startup File, Next: Examples, Prev: Time-Stamping, Up: Top
+
+Startup File
+************
+
+ Once you know how to change default settings of Wget through command
+line arguments, you may wish to make some of those settings permanent.
+You can do that in a convenient way by creating the Wget startup
+file--`.wgetrc'.
+
+ Besides `.wgetrc' is the "main" initialization file, it is
+convenient to have a special facility for storing passwords. Thus Wget
+reads and interprets the contents of `$HOME/.netrc', if it finds it.
+You can find `.netrc' format in your system manuals.
+
+ Wget reads `.wgetrc' upon startup, recognizing a limited set of
+commands.
+
+* Menu:
+
+* Wgetrc Location:: Location of various wgetrc files.
+* Wgetrc Syntax:: Syntax of wgetrc.
+* Wgetrc Commands:: List of available commands.
+* Sample Wgetrc:: A wgetrc example.
+
\1f
File: wget.info, Node: Wgetrc Location, Next: Wgetrc Syntax, Prev: Startup File, Up: Startup File
Enable/disable host-prefixed file names. `-nH' disables it.
continue = on/off
- Enable/disable continuation of the retrieval, the same as `-c'
+ Enable/disable continuation of the retrieval - the same as `-c'
(which enables it).
background = on/off
- Enable/disable going to background, the same as `-b' (which enables
- it).
+ Enable/disable going to background - the same as `-b' (which
+ enables it).
backup_converted = on/off
Enable/disable saving pre-converted files with the suffix `.orig'
- the same as `-K' (which enables it).
base = STRING
- Set base for relative URLs, the same as `-B'.
+ Consider relative URLs in URL input files forced to be interpreted
+ as HTML as being relative to STRING - the same as `-B'.
cache = on/off
When set to off, disallow server-caching. See the `-C' option.
Debug mode, same as `-d'.
delete_after = on/off
- Delete after download, the same as `--delete-after'.
+ Delete after download - the same as `--delete-after'.
dir_prefix = STRING
- Top of directory tree, the same as `-P'.
+ Top of directory tree - the same as `-P'.
dirstruct = on/off
- Turning dirstruct on or off, the same as `-x' or `-nd',
+ Turning dirstruct on or off - the same as `-x' or `-nd',
respectively.
domains = STRING
exclude_directories = STRING
Specify a comma-separated list of directories you wish to exclude
- from download, the same as `-X' (*Note Directory-Based Limits::).
+ from download - the same as `-X' (*Note Directory-Based Limits::).
exclude_domains = STRING
Same as `--exclude-domains' (*Note Domain Acceptance::).
follow_ftp = on/off
- Follow FTP links from HTML documents, the same as `-f'.
+ Follow FTP links from HTML documents - the same as `-f'.
+
+follow_tags = STRING
+ Only follow certain HTML tags when doing a recursive retrieval,
+ just like `--follow-tags'.
force_html = on/off
If set to on, force the input filename to be regarded as an HTML
- document, the same as `-F'.
+ document - the same as `-F'.
ftp_proxy = STRING
Use STRING as FTP proxy, instead of the one specified in
environment.
glob = on/off
- Turn globbing on/off, the same as `-g'.
+ Turn globbing on/off - the same as `-g'.
header = STRING
Define an additional header, like `--header'.
When set to on, ignore `Content-Length' header; the same as
`--ignore-length'.
+ignore_tags = STRING
+ Ignore certain HTML tags when doing a recursive retrieval, just
+ like `-G' / `--ignore-tags'.
+
include_directories = STRING
Specify a comma-separated list of directories you wish to follow
- when downloading, the same as `-I'.
+ when downloading - the same as `-I'.
input = STRING
Read the URLs from STRING, like `-i'.
the value in `Content-Length'.
logfile = STRING
- Set logfile, the same as `-o'.
+ Set logfile - the same as `-o'.
login = STRING
Your user name on the remote machine, for FTP. Defaults to
proxy loading, instead of the one specified in environment.
output_document = STRING
- Set the output filename, the same as `-O'.
+ Set the output filename - the same as `-O'.
+
+page_requisites = on/off
+ Download all ancillary documents necessary for a single HTML page
+ to display properly - the same as `-p'.
passive_ftp = on/off
- Set passive FTP, the same as `--passive-ftp'.
+ Set passive FTP - the same as `--passive-ftp'.
passwd = STRING
Set your FTP password to PASSWORD. Without this setting, the
Set proxy authentication password to STRING, like `--proxy-passwd'.
quiet = on/off
- Quiet mode, the same as `-q'.
+ Quiet mode - the same as `-q'.
quota = QUOTA
- Specify the download quota, which is useful to put in global
- wgetrc. When download quota is specified, Wget will stop retrieving
- after the download sum has become greater than quota. The quota
- can be specified in bytes (default), kbytes `k' appended) or mbytes
- (`m' appended). Thus `quota = 5m' will set the quota to 5 mbytes.
- Note that the user's startup file overrides system settings.
+ Specify the download quota, which is useful to put in the global
+ `wgetrc'. When download quota is specified, Wget will stop
+ retrieving after the download sum has become greater than quota.
+ The quota can be specified in bytes (default), kbytes `k'
+ appended) or mbytes (`m' appended). Thus `quota = 5m' will set
+ the quota to 5 mbytes. Note that the user's startup file overrides
+ system settings.
reclevel = N
- Recursion level, the same as `-l'.
+ Recursion level - the same as `-l'.
recursive = on/off
- Recursive on/off, the same as `-r'.
+ Recursive on/off - the same as `-r'.
relative_only = on/off
- Follow only relative links, the same as `-L' (*Note Relative
+ Follow only relative links - the same as `-L' (*Note Relative
Links::).
remove_listing = on/off
what you are doing before changing the default (which is `on').
server_response = on/off
- Choose whether or not to print the HTTP and FTP server responses,
+ Choose whether or not to print the HTTP and FTP server responses -
the same as `-S'.
simple_host_check = on/off
Same as `-H'.
timeout = N
- Set timeout value, the same as `-T'.
+ Set timeout value - the same as `-T'.
timestamping = on/off
Turn timestamping on/off. The same as `-N' (*Note Time-Stamping::).
tries = N
- Set number of retries per URL, the same as `-t'.
+ Set number of retries per URL - the same as `-t'.
use_proxy = on/off
Turn proxy support on/off. The same as `-Y'.
verbose = on/off
- Turn verbose on/off, the same as `-v'/`-nv'.
+ Turn verbose on/off - the same as `-v'/`-nv'.
wait = N
- Wait N seconds between retrievals, the same as `-w'.
+ Wait N seconds between retrievals - the same as `-w'.
+
+waitretry = N
+ Wait up to N seconds between retries of failed retrievals only -
+ the same as `--waitretry'. Note that this is turned on by default
+ in the global `wgetrc'.
\1f
File: wget.info, Node: Sample Wgetrc, Prev: Wgetrc Commands, Up: Startup File
startup file), and one for local usage (suitable for `$HOME/.wgetrc').
Be careful about the things you change.
- Note that all the lines are commented out. For any line to have
-effect, you must remove the `#' prefix at the beginning of line.
+ Note that almost all the lines are commented out. For a command to
+have any effect, you must remove the `#' character at the beginning of
+its line.
###
### Sample Wget initialization file .wgetrc
## Wget initialization file can reside in /usr/local/etc/wgetrc
## (global, for all users) or $HOME/.wgetrc (for a single user).
##
- ## To use any of the settings in this file, you will have to uncomment
- ## them (and probably change them).
+ ## To use the settings in this file, you will have to uncomment them,
+ ## as well as change them, in most cases, as the values on the
+ ## commented-out lines are the default values (e.g. "off").
##
# can turn this on to make Wget use passive FTP by default.
#passive_ftp = off
+ # The "wait" command below makes Wget wait between every connection.
+ # If, instead, you want Wget to wait only between retries of failed
+ # downloads, set waitretry to maximum number of seconds to wait (Wget
+ # will use "linear backoff", waiting 1 second after the first failure
+ # on a file, 2 seconds after the second failure, etc. up to this max).
+ waitretry = 10
+
##
## Local settings (for a user to set in his $HOME/.wgetrc). It is
# you are not sure you know what it means) by setting this to on.
#recursive = off
+ # To always back up file X as X.orig before converting its links (due
+ # to -k / --convert-links / convert_links = on having been specified),
+ # set this variable to on:
+ #backup_converted = off
+
# To have Wget follow FTP links from HTML files by default, set this
# to on:
#follow_ftp = off
Like all GNU utilities, the latest version of Wget can be found at
the master GNU archive site prep.ai.mit.edu, and its mirrors. For
example, Wget 1.5.3+dev can be found at
-`ftp://prep.ai.mit.edu/pub/gnu/wget-1.5.3+dev.tar.gz'
+`ftp://prep.ai.mit.edu/gnu/wget/wget-1.5.3+dev.tar.gz'
\1f
File: wget.info, Node: Mailing List, Next: Reporting Bugs, Prev: Distribution, Up: Various
The description of the norobots standard was written, and is
maintained by Martijn Koster <m.koster@webcrawler.com>. With his
-permission, I contribute a (slightly modified) texified version of the
+permission, I contribute a (slightly modified) TeXified version of the
RES.
* Menu:
The field name is case insensitive.
- Comments can be included in file using UNIX bourne shell conventions:
+ Comments can be included in file using UNIX Bourne shell conventions:
the `#' character is used to indicate that preceding space (if any) and
the remainder of the line up to the line termination is discarded.
Lines containing only a comment are discarded completely, and therefore
* Darko Budor--initial port to Windows.
- * Antonio Rosella--help and suggestions, plust the Italian
+ * Antonio Rosella--help and suggestions, plus the Italian
translation.
* Tomislav Petrovic, Mario Mikocevic--many bug reports and
that make maintenance so much fun:
Tim Adam, Martin Baehr, Dieter Baron, Roger Beeman and the Gurus at
-Cisco, Mark Boyns, John Burden, Wanderlei Cavassin, Gilles Cedoc, Tim
-Charron, Noel Cragg, Kristijan Conkas, Damir Dzeko, Andrew Davison,
-Ulrich Drepper, Marc Duponcheel, Aleksandar Erkalovic, Andy Eskilsson,
-Masashi Fujita, Howard Gayle, Marcel Gerrits, Hans Grobler, Mathieu
-Guillaume, Karl Heuer, Gregor Hoffleit, Erik Magnus Hulthen, Richard
-Huveneers, Simon Josefsson, Mario Juric, Goran Kezunovic, Robert Kleine,
-Fila Kolodny, Alexander Kourakos, Martin Kraemer, Simos KSenitellis,
-Tage Stabell-Kulo, Hrvoje Lacko, Dave Love, Jordan Mendelson, Lin Zhe
-Min, Charlie Negyesi, Andrew Pollock, Steve Pothier, Marin Purgar, Jan
-Prikryl, Keith Refson, Tobias Ringstrom, Juan Jose Rodrigues, Heinz
-Salzmann, Robert Schmidt, Toomas Soome, Sven Sternberger, Markus
-Strasser, Szakacsits Szabolcs, Mike Thomas, Russell Vincent, Douglas E.
-Wegscheid, Jasmin Zainul, Bojan Zdrnja, Kristijan Zimmer.
+Cisco, Dan Berger, Mark Boyns, John Burden, Wanderlei Cavassin, Gilles
+Cedoc, Tim Charron, Noel Cragg, Kristijan Conkas, Andrew Deryabin,
+Damir Dzeko, Andrew Davison, Ulrich Drepper, Marc Duponcheel,
+Aleksandar Erkalovic, Andy Eskilsson, Masashi Fujita, Howard Gayle,
+Marcel Gerrits, Hans Grobler, Mathieu Guillaume, Dan Harkless, Heiko
+Herold, Karl Heuer, HIROSE Masaaki, Gregor Hoffleit, Erik Magnus
+Hulthen, Richard Huveneers, Simon Josefsson, Mario Juric, Goran
+Kezunovic, Robert Kleine, Fila Kolodny, Alexander Kourakos, Martin
+Kraemer, Simos KSenitellis, Hrvoje Lacko, Daniel S. Lewart, Dave Love,
+Jordan Mendelson, Lin Zhe Min, Charlie Negyesi, Andrew Pollock, Steve
+Pothier, Jan Prikryl, Marin Purgar, Keith Refson, Tobias Ringstrom,
+Juan Jose Rodrigues, Edward J. Sabol, Heinz Salzmann, Robert Schmidt,
+Toomas Soome, Tage Stabell-Kulo, Sven Sternberger, Markus Strasser,
+Szakacsits Szabolcs, Mike Thomas, Russell Vincent, Charles G Waldman,
+Douglas E. Wegscheid, Jasmin Zainul, Bojan Zdrnja, Kristijan Zimmer.
Apologies to all who I accidentally left out, and many thanks to all
the subscribers of the Wget mailing list.