@sp 1
@item
-Wget works exceedingly well on slow or unstable connections,
+Wget works exceedingly well on slow or unstable connections,
retrying the document until it is fully retrieved, or until a
user-specified retry count is surpassed. It will try to resume the
download from the point of interruption, using @code{REST} with @sc{ftp}
@end example
Two alternative variants of @sc{url} specification are also supported,
-because of historical (hysterical?) reasons and their wide-spreadedness.
+because of historical (hysterical?) reasons and their widespreaded use.
@sc{ftp}-only syntax (supported by @code{NcFTP}):
@example
create directories.
@cindex conversion of links
-@cindex links conversion
+@cindex link conversion
@item -k
@itemx --convert-links
Convert the non-relative links to relative ones locally. Only the
@cindex backing up converted files
@item -K
@itemx --backup-converted
-When converting a file, back up the original version with a @samp{.orig} suffix.
+When converting a file, back up the original version with a @samp{.orig}
+suffix. Affects the behavior of @samp{-N} (@xref{HTTP Time-Stamping
+Internals}).
@item -m
@itemx --mirror
Exclude the domains given in a comma-separated @var{domain-list} from
@sc{dns}-lookup (@xref{Domain Acceptance}).
-@item -L
-@itemx --relative
-Follow relative links only. Useful for retrieving a specific home page
-without any distractions, not even those from the same hosts
-(@xref{Relative Links}).
-
@cindex follow FTP links
@item --follow-ftp
Follow @sc{ftp} links from @sc{html} documents. Without this option,
Wget will ignore all the @sc{ftp} links.
+@cindex tag-based recursive pruning
+@item --follow-tags=@var{list}
+Wget has an internal table of HTML tag / attribute pairs that it
+considers when looking for linked documents during a recursive
+retrieval. If a user wants only a subset of those tags to be
+considered, however, he or she should be specify such tags in a
+comma-separated @var{list} with this option.
+
+@item -G @var{list}
+@itemx --ignore-tags=@var{list}
+This is the opposite of the @samp{--follow-tags} option. To skip
+certain HTML tags when recursively looking for documents to download,
+specify them in a comma-separated @var{list}. The author of this option
+likes to use the following command to download a single HTML page and
+all documents necessary to display it properly:
+
+@example
+wget -Ga,area -H -k -K -nh -r http://@var{site}/@var{document}
+@end example
+
@item -H
@itemx --span-hosts
Enable spanning across hosts when doing recursive retrieving (@xref{All
Hosts}).
+@item -L
+@itemx --relative
+Follow relative links only. Useful for retrieving a specific home page
+without any distractions, not even those from the same hosts
+(@xref{Relative Links}).
+
@item -I @var{list}
@itemx --include-directories=@var{list}
Specify a comma-separated list of directories you wish to follow when
the foreign server you are mirroring---the more requests it gets in a
rows, the greater is its load.
-Careless retrieving can also fill your file system unctrollably, which
+Careless retrieving can also fill your file system uncontrollably, which
can grind the machine to a halt.
The load can be minimized by lowering the maximum recursion level
@cindex links
@cindex following links
-When retrieving recursively, one does not wish to retrieve the loads of
+When retrieving recursively, one does not wish to retrieve loads of
unnecessary data. Most of the time the users bear in mind exactly what
they want to download, and want Wget to follow only specific links.
The drawback of following the relative links solely is that humans often
tend to mix them with absolute links to the very same host, and the very
same page. In this mode (which is the default mode for following links)
-all @sc{url}s the that refer to the same host will be retrieved.
+all @sc{url}s that refer to the same host will be retrieved.
The problem with this option are the aliases of the hosts and domains.
Thus there is no way for Wget to know that @samp{regoc.srce.hr} and
check whether we are maybe dealing with the same hosts. Although the
results of @code{gethostbyname} are cached, it is still a great
slowdown, e.g. when dealing with large indices of home pages on different
-hosts (because each of the hosts must be and @sc{dns}-resolved to see
-whether it just @emph{might} an alias of the starting host).
+hosts (because each of the hosts must be @sc{dns}-resolved to see
+whether it just @emph{might} be an alias of the starting host).
To avoid the overhead you may use @samp{-nh}, which will turn off
@sc{dns}-resolving and make Wget compare hosts literally. This will
(e.g. @samp{www.srce.hr} and @samp{regoc.srce.hr} will be flagged as
different hosts).
-Note that modern @sc{http} servers allows one IP address to host several
-@dfn{virtual servers}, each having its own directory hieratchy. Such
+Note that modern @sc{http} servers allow one IP address to host several
+@dfn{virtual servers}, each having its own directory hierarchy. Such
``servers'' are distinguished by their hostnames (all of which point to
the same IP address); for this to work, a client must send a @code{Host}
header, which is what Wget does. However, in that case Wget @emph{must
not} try to divine a host's ``real'' address, nor try to use the same
hostname for each access, i.e. @samp{-nh} must be turned on.
-In other words, the @samp{-nh} option must be used to enabling the
+In other words, the @samp{-nh} option must be used to enable the
retrieval from virtual servers distinguished by their hostnames. As the
number of such server setups grow, the behavior of @samp{-nh} may become
the default in the future.
When downloading material from the web, you will often want to restrict
the retrieval to only certain file types. For example, if you are
-interested in downloading @sc{gifs}, you will not be overjoyed to get
-loads of Postscript documents, and vice versa.
+interested in downloading @sc{gif}s, you will not be overjoyed to get
+loads of PostScript documents, and vice versa.
Wget offers two options to deal with this problem. Each option
description lists a short name, a long name, and the equivalent command
The @samp{-A} and @samp{-R} options may be combined to achieve even
better fine-tuning of which files to retrieve. E.g. @samp{wget -A
"*zelazny*" -R .ps} will download all the files having @samp{zelazny} as
-a part of their name, but @emph{not} the postscript files.
+a part of their name, but @emph{not} the PostScript files.
Note that these two options do not affect the downloading of @sc{html}
files; Wget must load all the @sc{html}s to know where to go at
@itemx no_parent = on
The simplest, and often very useful way of limiting directories is
disallowing retrieval of the links that refer to the hierarchy
-@dfn{upper} than the beginning directory, i.e. disallowing ascent to the
+@dfn{above} than the beginning directory, i.e. disallowing ascent to the
parent directory/directories.
The @samp{--no-parent} option (short @samp{-np}) is useful in this case.
(which enables it).
@item background = on/off
-Enable/disable going to background, the same as @samp{-b} (which enables
+Enable/disable going to background, the same as @samp{-b} (which enables
it).
@item backup_converted = on/off
@item follow_ftp = on/off
Follow @sc{ftp} links from @sc{html} documents, the same as @samp{-f}.
+@item follow_tags = @var{string}
+Only follow certain HTML tags when doing a recursive retrieval, just like
+@samp{--follow-tags}.
+
@item force_html = on/off
If set to on, force the input filename to be regarded as an @sc{html}
document, the same as @samp{-F}.
When set to on, ignore @code{Content-Length} header; the same as
@samp{--ignore-length}.
+@item ignore_tags = @var{string}
+Ignore certain HTML tags when doing a recursive retrieval, just like
+@samp{-G} / @samp{--ignore-tags}.
+
@item include_directories = @var{string}
Specify a comma-separated list of directories you wish to follow when
downloading, the same as @samp{-I}.
## avoid having to type many many command-line options. This file does
## not contain a comprehensive list of commands -- look at the manual
## to find out what you can put into this file.
-##
+##
## Wget initialization file can reside in /usr/local/etc/wgetrc
## (global, for all users) or $HOME/.wgetrc (for a single user).
##
@item ftp_proxy
This variable should contain the @sc{url} of the proxy for @sc{http}
-connections. It is quite common that @sc{http_proxy} and @sc{ftp_proxy}
+connections. It is quite common that @sc{http_proxy} and @sc{ftp_proxy}
are set to the same @sc{url}.
@item no_proxy
-This variable should contain a comma-separated list of domain extensions
+This variable should contain a comma-separated list of domain extensions
proxy should @emph{not} be used for. For instance, if the value of
@code{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve
documents from MIT.
Like all GNU utilities, the latest version of Wget can be found at the
master GNU archive site prep.ai.mit.edu, and its mirrors. For example,
Wget @value{VERSION} can be found at
-@url{ftp://prep.ai.mit.edu/pub/gnu/wget-@value{VERSION}.tar.gz}
+@url{ftp://prep.ai.mit.edu/gnu/wget/wget-@value{VERSION}.tar.gz}
@node Mailing List, Reporting Bugs, Distribution, Various
@section Mailing List
The description of the norobots standard was written, and is maintained
by Martijn Koster @email{m.koster@@webcrawler.com}. With his
-permission, I contribute a (slightly modified) texified version of the
+permission, I contribute a (slightly modified) TeXified version of the
@sc{res}.
@menu
@end example
The field name is case insensitive.
-
-Comments can be included in file using UNIX bourne shell conventions:
+
+Comments can be included in file using UNIX Bourne shell conventions:
the @samp{#} character is used to indicate that preceding space (if any)
and the remainder of the line up to the line termination is discarded.
Lines containing only a comment are discarded completely, and therefore
Darko Budor---initial port to Windows.
@item
-Antonio Rosella---help and suggestions, plust the Italian translation.
+Antonio Rosella---help and suggestions, plus the Italian translation.
@item
@iftex
Martin Baehr,
Dieter Baron,
Roger Beeman and the Gurus at Cisco,
+Dan Berger,
Mark Boyns,
John Burden,
Wanderlei Cavassin,
Hans Grobler,
Mathieu Guillaume,
Dan Harkless,
+Heiko Herold,
Karl Heuer,
+HIROSE Masaaki,
Gregor Hoffleit,
Erik Magnus Hulthen,
Richard Huveneers,
@ifinfo
Simos KSenitellis,
@end ifinfo
-Tage Stabell-Kulo,
Hrvoje Lacko,
+Daniel S. Lewart,
Dave Love,
Jordan Mendelson,
Lin Zhe Min,
Charlie Negyesi,
Andrew Pollock,
Steve Pothier,
-Marin Purgar,
Jan Prikryl,
+Marin Purgar,
Keith Refson,
Tobias Ringstrom,
@c Texinfo doesn't grok @'{@i}, so we have to use TeX itself.
Heinz Salzmann,
Robert Schmidt,
Toomas Soome,
+Tage Stabell-Kulo,
Sven Sternberger,
Markus Strasser,
Szakacsits Szabolcs,
Mike Thomas,
Russell Vincent,
+Charles G Waldman,
Douglas E. Wegscheid,
Jasmin Zainul,
@iftex
Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
type `show w'. This is free software, and you are welcome
-to redistribute it under certain conditions; type `show c'
+to redistribute it under certain conditions; type `show c'
for details.
@end smallexample
@group
Yoyodyne, Inc., hereby disclaims all copyright
interest in the program `Gnomovision'
-(which makes passes at compilers) written
+(which makes passes at compilers) written
by James Hacker.
@var{signature of Ty Coon}, 1 April 1989