@c %**start of header
@setfilename wget.info
@include version.texi
-@set UPDATED Jun 2008
@settitle GNU Wget @value{VERSION} Manual
@c Disable the monstrous rectangles beside overfull hbox-es.
@finalout
@contents
@ifnottex
-@node Top
+@node Top, Overview, (dir), (dir)
@top Wget @value{VERSION}
@insertcopying
* Concept Index:: Topics covered by this manual.
@end menu
-@node Overview
+@node Overview, Invoking, Top, Top
@chapter Overview
@cindex overview
@cindex features
@c man end
@end ignore
@c man begin DESCRIPTION
-Wget can follow links in @sc{html} and @sc{xhtml} pages and create local
-versions of remote web sites, fully recreating the directory structure of
-the original site. This is sometimes referred to as ``recursive
-downloading.'' While doing that, Wget respects the Robot Exclusion
-Standard (@file{/robots.txt}). Wget can be instructed to convert the
-links in downloaded @sc{html} files to the local files for offline
-viewing.
+Wget can follow links in @sc{html}, @sc{xhtml}, and @sc{css} pages, to
+create local versions of remote web sites, fully recreating the
+directory structure of the original site. This is sometimes referred to
+as ``recursive downloading.'' While doing that, Wget respects the Robot
+Exclusion Standard (@file{/robots.txt}). Wget can be instructed to
+convert the links in downloaded files to point at the local files, for
+offline viewing.
@c man end
@item
file @file{COPYING} that came with GNU Wget, for details).
@end itemize
-@node Invoking
+@node Invoking, Recursive Download, Overview, Top
@chapter Invoking
@cindex invoking
@cindex command line
* Recursive Accept/Reject Options::
@end menu
-@node URL Format
+@node URL Format, Option Syntax, Invoking, Invoking
@section URL Format
@cindex URL
@cindex URL syntax
@c man begin OPTIONS
-@node Option Syntax
+@node Option Syntax, Basic Startup Options, URL Format, Invoking
@section Option Syntax
@cindex option syntax
@cindex syntax of options
using @samp{--no-follow-ftp} is the only way to restore the factory
default from the command line.
-@node Basic Startup Options
+@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking
@section Basic Startup Options
@table @samp
@end table
-@node Logging and Input File Options
+@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking
@section Logging and Input File Options
@table @samp
@cindex input-file
@item -i @var{file}
@itemx --input-file=@var{file}
-Read @sc{url}s from @var{file}. If @samp{-} is specified as
-@var{file}, @sc{url}s are read from the standard input. (Use
-@samp{./-} to read from a file literally named @samp{-}.)
+Read @sc{url}s from a local or external @var{file}. If @samp{-} is
+specified as @var{file}, @sc{url}s are read from the standard input.
+(Use @samp{./-} to read from a file literally named @samp{-}.)
If this function is used, no @sc{url}s need be present on the command
line. If there are @sc{url}s both on the command line and in an input
href="@var{url}">} to the documents or by specifying
@samp{--base=@var{url}} on the command line.
+If the @var{file} is an external one, the document will be automatically
+treated as @samp{html} if the Content-Type matches @samp{text/html}.
+Furthermore, the @var{file}'s location will be implicitly used as base
+href if none was specified.
+
@cindex force html
@item -F
@itemx --force-html
the @samp{-i} option.
@end table
-@node Download Options
+@node Download Options, Directory Options, Logging and Input File Options, Invoking
@section Download Options
@table @samp
given file, then waiting 2 seconds after the second failure on that
file, up to the maximum number of @var{seconds} you specify. Therefore,
a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55
-seconds per file.
+seconds per file.
-Note that this option is turned on by default in the global
-@file{wgetrc} file.
+By default, Wget will assume a value of 10 seconds.
@cindex wait, random
@cindex random wait
when @samp{--password} is being used, because they are mutually exclusive.
@end table
-@node Directory Options
+@node Directory Options, HTTP Options, Download Options, Invoking
@section Directory Options
@table @samp
current directory).
@end table
-@node HTTP Options
+@node HTTP Options, HTTPS (SSL/TLS) Options, Directory Options, Invoking
@section HTTP Options
@table @samp
+@cindex default page name
+@cindex index.html
+@item --default-page=@var{name}
+Use @var{name} as the default file name when it isn't known (i.e., for
+URLs that end in a slash), instead of @file{index.html}.
+
@cindex .html extension
@item -E
@itemx --html-extension
version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive
Retrieval Options}).
+As of version 1.12, Wget will also ensure that any downloaded files of
+type @samp{text/css} end in the suffix @samp{.css}. Obviously, this
+makes the name @samp{--html-extension} misleading; a better name is
+expected to be offered as an alternative in the near future.
+
@cindex http user
@cindex http password
@cindex authentication
Considerations}.
@end iftex
+@cindex Keep-Alive, turning off
+@cindex Persistent Connections, disabling
+@item --no-http-keep-alive
+Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget
+asks the server to keep the connection open so that, when you download
+more than one document from the same server, they get transferred over
+the same TCP connection. This saves time and at the same time reduces
+the load on the server.
+
+This option is useful when, for some reason, persistent (keep-alive)
+connections don't work for you, for example due to a server bug or due
+to the inability of server-side scripts to cope with the connections.
+
@cindex proxy
@cindex cache
@item --no-cache
@end table
-@node HTTPS (SSL/TLS) Options
+@node HTTPS (SSL/TLS) Options, FTP Options, HTTP Options, Invoking
@section HTTPS (SSL/TLS) Options
@cindex SSL
systems that support @file{/dev/random}.
@end table
-@node FTP Options
+@node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking
@section FTP Options
@table @samp
specified on the command-line, rather than because it was recursed to,
this option has no effect. Symbolic links are always traversed in this
case.
-
-@cindex Keep-Alive, turning off
-@cindex Persistent Connections, disabling
-@item --no-http-keep-alive
-Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget
-asks the server to keep the connection open so that, when you download
-more than one document from the same server, they get transferred over
-the same TCP connection. This saves time and at the same time reduces
-the load on the server.
-
-This option is useful when, for some reason, persistent (keep-alive)
-connections don't work for you, for example due to a server bug or due
-to the inability of server-side scripts to cope with the connections.
@end table
-@node Recursive Retrieval Options
+@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking
@section Recursive Retrieval Options
@table @samp
option to turn it on.
@end table
-@node Recursive Accept/Reject Options
+@node Recursive Accept/Reject Options, , Recursive Retrieval Options, Invoking
@section Recursive Accept/Reject Options
@table @samp
@c man end
-@node Recursive Download
+@node Recursive Download, Following Links, Invoking, Top
@chapter Recursive Download
@cindex recursion
@cindex retrieving
@sc{http} or @sc{ftp} server), following links and directory structure.
We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.
-With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} from
-the given @sc{url}, documents, retrieving the files the @sc{html}
-document was referring to, through markup like @code{href}, or
-@code{src}. If the freshly downloaded file is also of type
-@code{text/html} or @code{application/xhtml+xml}, it will be parsed and
-followed further.
+With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} or
+@sc{css} from the given @sc{url}, retrieving the files the document
+refers to, through markup like @code{href} or @code{src}, or @sc{css}
+@sc{uri} values specified using the @samp{url()} functional notation.
+If the freshly downloaded file is also of type @code{text/html},
+@code{application/xhtml+xml}, or @code{text/css}, it will be parsed
+and followed further.
-Recursive retrieval of @sc{http} and @sc{html} content is
+Recursive retrieval of @sc{http} and @sc{html}/@sc{css} content is
@dfn{breadth-first}. This means that Wget first downloads the requested
-@sc{html} document, then the documents linked from that document, then the
+document, then the documents linked from that document, then the
documents linked by them, and so on. In other words, Wget first
downloads the documents at depth 1, then those at depth 2, and so on
until the specified maximum depth.
Recursive retrieval should be used with care. Don't say you were not
warned.
-@node Following Links
+@node Following Links, Time-Stamping, Recursive Download, Top
@chapter Following Links
@cindex links
@cindex following links
* FTP Links:: Following FTP links.
@end menu
-@node Spanning Hosts
+@node Spanning Hosts, Types of Files, Following Links, Following Links
@section Spanning Hosts
@cindex spanning hosts
@cindex hosts, spanning
@end table
-@node Types of Files
+@node Types of Files, Directory-Based Limits, Spanning Hosts, Following Links
@section Types of Files
@cindex types of files
This behavior, too, is considered less-than-desirable, and may change
in a future version of Wget.
-@node Directory-Based Limits
+@node Directory-Based Limits, Relative Links, Types of Files, Following Links
@section Directory-Based Limits
@cindex directories
@cindex directory limits
meaningless, as its parent is @samp{/}).
@end table
-@node Relative Links
+@node Relative Links, FTP Links, Directory-Based Limits, Following Links
@section Relative Links
@cindex relative links
This option is probably not very useful and might be removed in a future
release.
-@node FTP Links
+@node FTP Links, , Relative Links, Following Links
@section Following FTP Links
@cindex following ftp links
Also note that followed links to @sc{ftp} directories will not be
retrieved recursively further.
-@node Time-Stamping
+@node Time-Stamping, Startup File, Following Links, Top
@chapter Time-Stamping
@cindex time-stamping
@cindex timestamping
* FTP Time-Stamping Internals::
@end menu
-@node Time-Stamping Usage
+@node Time-Stamping Usage, HTTP Time-Stamping Internals, Time-Stamping, Time-Stamping
@section Time-Stamping Usage
@cindex time-stamping usage
@cindex usage, time-stamping
directory listing with dates in a format that Wget can parse
(@pxref{FTP Time-Stamping Internals}).
-@node HTTP Time-Stamping Internals
+@node HTTP Time-Stamping Internals, FTP Time-Stamping Internals, Time-Stamping Usage, Time-Stamping
@section HTTP Time-Stamping Internals
@cindex http time-stamping
Arguably, @sc{http} time-stamping should be implemented using the
@code{If-Modified-Since} request.
-@node FTP Time-Stamping Internals
+@node FTP Time-Stamping Internals, , HTTP Time-Stamping Internals, Time-Stamping
@section FTP Time-Stamping Internals
@cindex ftp time-stamping
@code{wu-ftpd}), which returns the exact time of the specified file.
Wget may support this command in the future.
-@node Startup File
+@node Startup File, Examples, Time-Stamping, Top
@chapter Startup File
@cindex startup file
@cindex wgetrc
* Sample Wgetrc:: A wgetrc example.
@end menu
-@node Wgetrc Location
+@node Wgetrc Location, Wgetrc Syntax, Startup File, Startup File
@section Wgetrc Location
@cindex wgetrc location
@cindex location of wgetrc
system-wide wgetrc (in @file{/usr/local/etc/wgetrc} by default).
Fascist admins, away!
-@node Wgetrc Syntax
+@node Wgetrc Syntax, Wgetrc Commands, Wgetrc Location, Startup File
@section Wgetrc Syntax
@cindex wgetrc syntax
@cindex syntax of wgetrc
reject =
@end example
-@node Wgetrc Commands
+@node Wgetrc Commands, Sample Wgetrc, Wgetrc Syntax, Startup File
@section Wgetrc Commands
@cindex wgetrc commands
@item debug = on/off
Debug mode, same as @samp{-d}.
+@item default_page = @var{string}
+Default page name---the same as @samp{--default-page=@var{string}}.
+
@item delete_after = on/off
Delete after download---the same as @samp{--delete-after}.
@item html_extension = on/off
Add a @samp{.html} extension to @samp{text/html} or
-@samp{application/xhtml+xml} files without it, like @samp{-E}.
+@samp{application/xhtml+xml} files without it, or a @samp{.css}
+extension to @samp{text/css} files without it, like @samp{-E}.
@item http_keep_alive = on/off
Turn the keep-alive feature on or off (defaults to on). Turning it
Save cookies to @var{file}. The same as @samp{--save-cookies
@var{file}}.
+@item save_headers = on/off
+Same as @samp{--save-headers}.
+
@item secure_protocol = @var{string}
Choose the secure protocol to be used. Legal values are @samp{auto}
(the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same
@item span_hosts = on/off
Same as @samp{-H}.
+@item spider = on/off
+Same as @samp{--spider}.
+
@item strict_comments = on/off
Same as @samp{--strict-comments}.
This command can be overridden using the @samp{ftp_user} and
@samp{http_user} command for @sc{ftp} and @sc{http} respectively.
+@item user_agent = @var{string}
+User agent identification sent to the HTTP Server---the same as
+@samp{--user-agent=@var{string}}.
+
@item verbose = on/off
Turn verbose on/off---the same as @samp{-v}/@samp{-nv}.
turned on by default in the global @file{wgetrc}.
@end table
-@node Sample Wgetrc
+@node Sample Wgetrc, , Wgetrc Commands, Startup File
@section Sample Wgetrc
@cindex sample wgetrc
@include sample.wgetrc.munged_for_texi_inclusion
@end example
-@node Examples
+@node Examples, Various, Startup File, Top
@chapter Examples
@cindex examples
* Very Advanced Usage:: The hairy stuff.
@end menu
-@node Simple Usage
+@node Simple Usage, Advanced Usage, Examples, Examples
@section Simple Usage
@itemize @bullet
@end example
@end itemize
-@node Advanced Usage
+@node Advanced Usage, Very Advanced Usage, Simple Usage, Examples
@section Advanced Usage
@itemize @bullet
@end example
@item
-The same as the above, but convert the links in the @sc{html} files to
+The same as the above, but convert the links in the downloaded files to
point to local files, so you can view the documents off-line:
@example
@end example
@end itemize
-@node Very Advanced Usage
+@node Very Advanced Usage, , Advanced Usage, Examples
@section Very Advanced Usage
@cindex mirroring
@end itemize
@c man end
-@node Various
+@node Various, Appendices, Examples, Top
@chapter Various
@cindex various
* Proxies:: Support for proxy servers.
* Distribution:: Getting the latest version.
* Web Site:: GNU Wget's presence on the World Wide Web.
-* Mailing List:: Wget mailing list for announcements and discussion.
+* Mailing Lists:: Wget mailing list for announcements and discussion.
* Internet Relay Chat:: Wget's presence on IRC.
* Reporting Bugs:: How and where to report bugs.
* Portability:: The systems Wget works on.
* Signals:: Signal-handling performed by Wget.
@end menu
-@node Proxies
+@node Proxies, Distribution, Various, Various
@section Proxies
@cindex proxies
settings @code{proxy_user} and @code{proxy_password} to set the proxy
username and password.
-@node Distribution
+@node Distribution, Web Site, Proxies, Various
@section Distribution
@cindex latest version
Wget @value{VERSION} can be found at
@url{ftp://ftp.gnu.org/pub/gnu/wget/wget-@value{VERSION}.tar.gz}
-@node Web Site
+@node Web Site, Mailing Lists, Distribution, Various
@section Web Site
@cindex web site
information resides at ``The Wget Wgiki'',
@url{http://wget.addictivecode.org/}.
-@node Mailing List
-@section Mailing List
+@node Mailing Lists, Internet Relay Chat, Web Site, Various
+@section Mailing Lists
@cindex mailing list
@cindex list
-There are several Wget-related mailing lists. The general discussion
-list is at @email{wget@@sunsite.dk}. It is the preferred place for
-support requests and suggestions, as well as for discussion of
-development. You are invited to subscribe.
-
-To subscribe, simply send mail to @email{wget-subscribe@@sunsite.dk}
-and follow the instructions. Unsubscribe by mailing to
-@email{wget-unsubscribe@@sunsite.dk}. The mailing list is archived at
+@unnumberedsubsec Primary List
+
+The primary mailinglist for discussion, bug-reports, or questions
+about GNU Wget is at @email{bug-wget@@gnu.org}. To subscribe, send an
+email to @email{bug-wget-join@@gnu.org}, or visit
+@url{http://lists.gnu.org/mailman/listinfo/bug-wget}.
+
+You do not need to subscribe to send a message to the list; however,
+please note that unsubscribed messages are moderated, and may take a
+while before they hit the list---@strong{usually around a day}. If
+you want your message to show up immediately, please subscribe to the
+list before posting. Archives for the list may be found at
+@url{http://lists.gnu.org/pipermail/bug-wget/}.
+
+An NNTP/Usenettish gateway is also available via
+@uref{http://gmane.org/about.php,Gmane}. You can see the Gmane
+archives at
+@url{http://news.gmane.org/gmane.comp.web.wget.general}. Note that the
+Gmane archives conveniently include messages from both the current
+list, and the previous one. Messages also show up in the Gmane
+archives sooner than they do at @url{lists.gnu.org}.
+
+@unnumberedsubsec Bug Notices List
+
+Additionally, there is the @email{wget-notify@@addictivecode.org} mailing
+list. This is a non-discussion list that receives bug report
+notifications from the bug-tracker. To subscribe to this list,
+send an email to @email{wget-notify-join@@addictivecode.org},
+or visit @url{http://addictivecode.org/mailman/listinfo/wget-notify}.
+
+@unnumberedsubsec Obsolete Lists
+
+Previously, the mailing list @email{wget@@sunsite.dk} was used as the
+main discussion list, and another list,
+@email{wget-patches@@sunsite.dk} was used for submitting and
+discussing patches to GNU Wget.
+
+Messages from @email{wget@@sunsite.dk} are archived at
+@itemize @tie{}
+@item
@url{http://www.mail-archive.com/wget%40sunsite.dk/} and at
-@url{http://news.gmane.org/gmane.comp.web.wget.general}.
-
-Another mailing list is at @email{wget-patches@@sunsite.dk}, and is
-used to submit patches for review by Wget developers. A ``patch'' is
-a textual representation of change to source code, readable by both
-humans and programs. The
-@url{http://wget.addictivecode.org/PatchGuidelines} page
-covers the creation and submitting of patches in detail. Please don't
-send general suggestions or bug reports to @samp{wget-patches}; use it
-only for patch submissions.
-
-Subscription is the same as above for @email{wget@@sunsite.dk}, except
-that you send to @email{wget-patches-subscribe@@sunsite.dk}, instead.
-The mailing list is archived at
-@url{http://news.gmane.org/gmane.comp.web.wget.patches}.
+@item
+@url{http://news.gmane.org/gmane.comp.web.wget.general} (which also
+continues to archive the current list, @email{bug-wget@@gnu.org}).
+@end itemize
-Finally, there is the @email{wget-notify@@addictivecode.org} mailing
-list. This is a non-discussion list that receives bug report-change
-notifications from the bug-tracker. Unlike for the other mailing lists,
-subscription is through the @code{mailman} interface at
-@url{http://addictivecode.org/mailman/listinfo/wget-notify}.
+Messages from @email{wget-patches@@sunsite.dk} are archived at
+@itemize @tie{}
+@item
+@url{http://news.gmane.org/gmane.comp.web.wget.patches}.
+@end itemize
-@node Internet Relay Chat
+@node Internet Relay Chat, Reporting Bugs, Mailing Lists, Various
@section Internet Relay Chat
@cindex Internet Relay Chat
@cindex IRC
In addition to the mailinglists, we also have a support channel set up
via IRC at @code{irc.freenode.org}, @code{#wget}. Come check it out!
-@node Reporting Bugs
+@node Reporting Bugs, Portability, Internet Relay Chat, Various
@section Reporting Bugs
@cindex bugs
@cindex reporting bugs
it's a bug. If things work strange, but you are not sure about the way
they are supposed to work, it might well be a bug, but you might want to
double-check the documentation and the mailing lists (@pxref{Mailing
-List}).
+Lists}).
@item
Try to repeat the bug in as simple circumstances as possible. E.g. if
@end enumerate
@c man end
-@node Portability
+@node Portability, Signals, Reporting Bugs, Various
@section Portability
@cindex portability
@cindex operating systems
Vanem; a port to VMS is maintained by Steven Schweda, and is available
at @url{http://antinode.org/}.
-@node Signals
+@node Signals, , Portability, Various
@section Signals
@cindex signal handling
@cindex hangup
Other than that, Wget will not try to interfere with signals in any way.
@kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it alike.
-@node Appendices
+@node Appendices, Copying this manual, Various, Top
@chapter Appendices
This chapter contains some references I consider useful.
* Contributors:: People who helped.
@end menu
-@node Robot Exclusion
+@node Robot Exclusion, Security Considerations, Appendices, Appendices
@section Robot Exclusion
@cindex robot exclusion
@cindex robots.txt
download and parse.
Although Wget is not a web robot in the strictest sense of the word, it
-can downloads large parts of the site without the user's intervention to
+can download large parts of the site without the user's intervention to
download an individual page. Because of that, Wget honors RES when
downloading recursively. For instance, when you issue:
@file{.wgetrc}. You can achieve the same effect from the command line
using the @code{-e} switch, e.g. @samp{wget -e robots=off @var{url}...}.
-@node Security Considerations
+@node Security Considerations, Contributors, Robot Exclusion, Appendices
@section Security Considerations
@cindex security
me).
@end enumerate
-@node Contributors
+@node Contributors, , Security Considerations, Appendices
@section Contributors
@cindex contributors
authentication.
@item
-Mauro Tortonesi---Improved IPv6 support, adding support for dual
+Mauro Tortonesi---improved IPv6 support, adding support for dual
family systems. Refactored and enhanced FTP IPv6 code. Maintained GNU
Wget from 2004--2007.
@item
-Christopher G.@: Lewis---Maintenance of the Windows version of GNU WGet.
+Christopher G.@: Lewis---maintenance of the Windows version of GNU WGet.
@item
-Gisle Vanem---Many helpful patches and improvements, especially for
+Gisle Vanem---many helpful patches and improvements, especially for
Windows and MS-DOS support.
@item
-Ralf Wildenhues---Contributed patches to convert Wget to use Automake as
+Ralf Wildenhues---contributed patches to convert Wget to use Automake as
part of its build process, and various bugfixes.
@item
modules, and the addition of password prompts at the console, via the
Gnulib getpasswd-gnu module.
+@item
+Ted Mielczarek---donated support for CSS.
+
@item
People who provided donations for development---including Brian Gough.
@end itemize
Aurelien Marchand,
Matthew J.@: Mellon,
Jordan Mendelson,
+Ted Mielczarek,
Lin Zhe Min,
Jan Minar,
Tim Mooney,
Apologies to all who I accidentally left out, and many thanks to all the
subscribers of the Wget mailing list.
-@node Copying this manual
+@node Copying this manual, Concept Index, Appendices, Top
@appendix Copying this manual
@menu
* GNU Free Documentation License:: Licnse for copying this manual.
@end menu
+@node GNU Free Documentation License, , Copying this manual, Copying this manual
+@appendixsec GNU Free Documentation License
+@cindex FDL, GNU Free Documentation License
+
@include fdl.texi
-@node Concept Index
+@node Concept Index, , Copying this manual, Top
@unnumbered Concept Index
@printindex cp