-This is Info file wget.info, produced by Makeinfo version 1.67 from the
+This is Info file wget.info, produced by Makeinfo version 1.68 from the
input file ./wget.texi.
INFO-DIR-SECTION Net Utilities
This file documents the the GNU Wget utility for downloading network
data.
- Copyright (C) 1996, 1997, 1998 Free Software Foundation, Inc.
+ Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
\1f
File: wget.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
-Wget 1.5.3
-**********
+Wget 1.5.3+dev
+**************
- This manual documents version 1.5.3 of GNU Wget, the freely
+ This manual documents version 1.5.3+dev of GNU Wget, the freely
available utility for network download.
Copyright (C) 1996, 1997, 1998 Free Software Foundation, Inc.
ftp://host/directory/file;type=a
Two alternative variants of URL specification are also supported,
-because of historical (hysterical?) reasons and their wide-spreadedness.
+because of historical (hysterical?) reasons and their widespreaded use.
FTP-only syntax (supported by `NcFTP'):
host:/dir/file
---------- Footnotes ----------
- (1) If you have a `.netrc' file in your home directory, password
+ (1) If you have a `.netrc' file in your home directory, password
will also be searched for there.
\1f
links have been downloaded. Because of that, much of the work
done by `-k' will be performed at the end of the downloads.
+`-K'
+`--backup-converted'
+ When converting a file, back up the original version with a `.orig'
+ suffix. Affects the behavior of `-N' (*Note HTTP Time-Stamping
+ Internals::).
+
`-m'
`--mirror'
Turn on options suitable for mirroring. This option turns on
Exclude the domains given in a comma-separated DOMAIN-LIST from
DNS-lookup (*Note Domain Acceptance::).
-`-L'
-`--relative'
- Follow relative links only. Useful for retrieving a specific home
- page without any distractions, not even those from the same hosts
- (*Note Relative Links::).
-
`--follow-ftp'
Follow FTP links from HTML documents. Without this option, Wget
will ignore all the FTP links.
+`--follow-tags=LIST'
+ Wget has an internal table of HTML tag / attribute pairs that it
+ considers when looking for linked documents during a recursive
+ retrieval. If a user wants only a subset of those tags to be
+ considered, however, he or she should be specify such tags in a
+ comma-separated LIST with this option.
+
+`-G LIST'
+`--ignore-tags=LIST'
+ This is the opposite of the `--follow-tags' option. To skip
+ certain HTML tags when recursively looking for documents to
+ download, specify them in a comma-separated LIST. The author of
+ this option likes to use the following command to download a
+ single HTML page and all documents necessary to display it
+ properly:
+
+ wget -Ga,area -H -k -K -nh -r http://SITE/DOCUMENT
+
`-H'
`--span-hosts'
Enable spanning across hosts when doing recursive retrieving
(*Note All Hosts::).
+`-L'
+`--relative'
+ Follow relative links only. Useful for retrieving a specific home
+ page without any distractions, not even those from the same hosts
+ (*Note Relative Links::).
+
`-I LIST'
`--include-directories=LIST'
Specify a comma-separated list of directories you wish to follow
Host Checking::).
`-np'
+
`--no-parent'
Do not ever ascend to the parent directory when retrieving
recursively. This is a useful option, since it guarantees that
same stands for the foreign server you are mirroring--the more requests
it gets in a rows, the greater is its load.
- Careless retrieving can also fill your file system unctrollably,
+ Careless retrieving can also fill your file system uncontrollably,
which can grind the machine to a halt.
The load can be minimized by lowering the maximum recursion level
Following Links
***************
- When retrieving recursively, one does not wish to retrieve the loads
-of unnecessary data. Most of the time the users bear in mind exactly
-what they want to download, and want Wget to follow only specific links.
+ When retrieving recursively, one does not wish to retrieve loads of
+unnecessary data. Most of the time the users bear in mind exactly what
+they want to download, and want Wget to follow only specific links.
For example, if you wish to download the music archive from
`fly.cc.fer.hr', you will not want to download all the home pages that
The drawback of following the relative links solely is that humans
often tend to mix them with absolute links to the very same host, and
the very same page. In this mode (which is the default mode for
-following links) all URLs the that refer to the same host will be
-retrieved.
+following links) all URLs that refer to the same host will be retrieved.
The problem with this option are the aliases of the hosts and
domains. Thus there is no way for Wget to know that `regoc.srce.hr' and
dealing with the same hosts. Although the results of `gethostbyname'
are cached, it is still a great slowdown, e.g. when dealing with large
indices of home pages on different hosts (because each of the hosts
-must be and DNS-resolved to see whether it just *might* an alias of the
+must be DNS-resolved to see whether it just *might* be an alias of the
starting host).
To avoid the overhead you may use `-nh', which will turn off
things run much faster, but also much less reliable (e.g. `www.srce.hr'
and `regoc.srce.hr' will be flagged as different hosts).
- Note that modern HTTP servers allows one IP address to host several
-"virtual servers", each having its own directory hieratchy. Such
+ Note that modern HTTP servers allow one IP address to host several
+"virtual servers", each having its own directory hierarchy. Such
"servers" are distinguished by their hostnames (all of which point to
the same IP address); for this to work, a client must send a `Host'
header, which is what Wget does. However, in that case Wget *must not*
try to divine a host's "real" address, nor try to use the same hostname
for each access, i.e. `-nh' must be turned on.
- In other words, the `-nh' option must be used to enabling the
+ In other words, the `-nh' option must be used to enable the
retrieval from virtual servers distinguished by their hostnames. As the
number of such server setups grow, the behavior of `-nh' may become the
default in the future.
When downloading material from the web, you will often want to
restrict the retrieval to only certain file types. For example, if you
-are interested in downloading GIFS, you will not be overjoyed to get
-loads of Postscript documents, and vice versa.
+are interested in downloading GIFs, you will not be overjoyed to get
+loads of PostScript documents, and vice versa.
Wget offers two options to deal with this problem. Each option
description lists a short name, a long name, and the equivalent command
The `-A' and `-R' options may be combined to achieve even better
fine-tuning of which files to retrieve. E.g. `wget -A "*zelazny*" -R
.ps' will download all the files having `zelazny' as a part of their
-name, but *not* the postscript files.
+name, but *not* the PostScript files.
Note that these two options do not affect the downloading of HTML
files; Wget must load all the HTMLs to know where to go at
`no_parent = on'
The simplest, and often very useful way of limiting directories is
disallowing retrieval of the links that refer to the hierarchy
- "upper" than the beginning directory, i.e. disallowing ascent to
+ "above" than the beginning directory, i.e. disallowing ascent to
the parent directory/directories.
The `--no-parent' option (short `-np') is useful in this case.
modified more recently (which makes it "newer"). If the remote file is
newer, it will be downloaded; if it is older, Wget will give up.(1)
+ When `--backup-converted' (`-K') is specified in conjunction with
+`-N', server file `X' is compared to local file `X.orig', if extant,
+rather than being compared to local file `X', which will always differ
+if it's been converted by `--convert-links' (`-k').
+
Arguably, HTTP time-stamping should be implemented using the
`If-Modified-Since' request.
---------- Footnotes ----------
- (1) As an additional check, Wget will look at the `Content-Length'
+ (1) As an additional check, Wget will look at the `Content-Length'
header, and compare the sizes; if they are not the same, the remote
file will be downloaded no matter what the time-stamp says.
`wu-ftpd'), which returns the exact time of the specified file. Wget
may support this command in the future.
-\1f
-File: wget.info, Node: Startup File, Next: Examples, Prev: Time-Stamping, Up: Top
-
-Startup File
-************
-
- Once you know how to change default settings of Wget through command
-line arguments, you may wish to make some of those settings permanent.
-You can do that in a convenient way by creating the Wget startup
-file--`.wgetrc'.
-
- Besides `.wgetrc' is the "main" initialization file, it is
-convenient to have a special facility for storing passwords. Thus Wget
-reads and interprets the contents of `$HOME/.netrc', if it finds it.
-You can find `.netrc' format in your system manuals.
-
- Wget reads `.wgetrc' upon startup, recognizing a limited set of
-commands.
-
-* Menu:
-
-* Wgetrc Location:: Location of various wgetrc files.
-* Wgetrc Syntax:: Syntax of wgetrc.
-* Wgetrc Commands:: List of available commands.
-* Sample Wgetrc:: A wgetrc example.
-
-\1f
-File: wget.info, Node: Wgetrc Location, Next: Wgetrc Syntax, Prev: Startup File, Up: Startup File
-
-Wgetrc Location
-===============
-
- When initializing, Wget will look for a "global" startup file,
-`/usr/local/etc/wgetrc' by default (or some prefix other than
-`/usr/local', if Wget was not installed there) and read commands from
-there, if it exists.
-
- Then it will look for the user's file. If the environmental variable
-`WGETRC' is set, Wget will try to load that file. Failing that, no
-further attempts will be made.
-
- If `WGETRC' is not set, Wget will try to load `$HOME/.wgetrc'.
-
- The fact that user's settings are loaded after the system-wide ones
-means that in case of collision user's wgetrc *overrides* the
-system-wide wgetrc (in `/usr/local/etc/wgetrc' by default). Fascist
-admins, away!
-