@c %**start of header
@setfilename wget.info
@include version.texi
-@set UPDATED Jun 2008
@settitle GNU Wget @value{VERSION} Manual
@c Disable the monstrous rectangles beside overfull hbox-es.
@finalout
@contents
@ifnottex
-@node Top
+@node Top, Overview, (dir), (dir)
@top Wget @value{VERSION}
@insertcopying
* Concept Index:: Topics covered by this manual.
@end menu
-@node Overview
+@node Overview, Invoking, Top, Top
@chapter Overview
@cindex overview
@cindex features
file @file{COPYING} that came with GNU Wget, for details).
@end itemize
-@node Invoking
+@node Invoking, Recursive Download, Overview, Top
@chapter Invoking
@cindex invoking
@cindex command line
* Recursive Accept/Reject Options::
@end menu
-@node URL Format
+@node URL Format, Option Syntax, Invoking, Invoking
@section URL Format
@cindex URL
@cindex URL syntax
@c man begin OPTIONS
-@node Option Syntax
+@node Option Syntax, Basic Startup Options, URL Format, Invoking
@section Option Syntax
@cindex option syntax
@cindex syntax of options
@samp{--no-} prefix. This might seem superfluous---if the default for
an affirmative option is to not do something, then why provide a way
to explicitly turn it off? But the startup file may in fact change
-the default. For instance, using @code{follow_ftp = off} in
-@file{.wgetrc} makes Wget @emph{not} follow FTP links by default, and
+the default. For instance, using @code{follow_ftp = on} in
+@file{.wgetrc} makes Wget @emph{follow} FTP links by default, and
using @samp{--no-follow-ftp} is the only way to restore the factory
default from the command line.
-@node Basic Startup Options
+@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking
@section Basic Startup Options
@table @samp
@end table
-@node Logging and Input File Options
+@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking
@section Logging and Input File Options
@table @samp
If this function is used, no @sc{url}s need be present on the command
line. If there are @sc{url}s both on the command line and in an input
file, those on the command lines will be the first ones to be
-retrieved. The @var{file} need not be an @sc{html} document (but no
-harm if it is)---it is enough if the @sc{url}s are just listed
-sequentially.
+retrieved. If @samp{--force-html} is not specified, then @var{file}
+should consist of a series of URLs, one per line.
However, if you specify @samp{--force-html}, the document will be
regarded as @samp{html}. In that case you may have problems with
the @samp{-i} option.
@end table
-@node Download Options
+@node Download Options, Directory Options, Logging and Input File Options, Invoking
@section Download Options
@table @samp
cases, the local file will be @dfn{clobbered}, or overwritten, upon
repeated download. In other cases it will be preserved.
-When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or @samp{p},
-downloading the same file in the same directory will result in the
-original copy of @var{file} being preserved and the second copy being
-named @samp{@var{file}.1}. If that file is downloaded yet again, the
-third copy will be named @samp{@var{file}.2}, and so on. When
-@samp{-nc} is specified, this behavior is suppressed, and Wget will
-refuse to download newer copies of @samp{@var{file}}. Therefore,
-``@code{no-clobber}'' is actually a misnomer in this mode---it's not
-clobbering that's prevented (as the numeric suffixes were already
-preventing clobbering), but rather the multiple version saving that's
-prevented.
-
-When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N}
-or @samp{-nc}, re-downloading a file will result in the new copy
-simply overwriting the old. Adding @samp{-nc} will prevent this
-behavior, instead causing the original version to be preserved and any
-newer copies on the server to be ignored.
+When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or
+@samp{-p}, downloading the same file in the same directory will result
+in the original copy of @var{file} being preserved and the second copy
+being named @samp{@var{file}.1}. If that file is downloaded yet
+again, the third copy will be named @samp{@var{file}.2}, and so on.
+(This is also the behavior with @samp{-nd}, even if @samp{-r} or
+@samp{-p} are in effect.) When @samp{-nc} is specified, this behavior
+is suppressed, and Wget will refuse to download newer copies of
+@samp{@var{file}}. Therefore, ``@code{no-clobber}'' is actually a
+misnomer in this mode---it's not clobbering that's prevented (as the
+numeric suffixes were already preventing clobbering), but rather the
+multiple version saving that's prevented.
+
+When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N},
+@samp{-nd}, or @samp{-nc}, re-downloading a file will result in the
+new copy simply overwriting the old. Adding @samp{-nc} will prevent
+this behavior, instead causing the original version to be preserved
+and any newer copies on the server to be ignored.
When running Wget with @samp{-N}, with or without @samp{-r} or
@samp{-p}, the decision as to whether or not to download a newer copy
Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
servers that support the @code{Range} header.
+@cindex iri support
+@cindex idn support
+@item --iri
+
+Turn on internationalized URI (IRI) support. Use @samp{--no-iri} to
+turn it off. IRI support is activated by default.
+
+You can set the default state of IRI support using @code{iri} command in
+@file{.wgetrc}. That setting may be overridden from the command line.
+
+@cindex local encoding
+@cindex locale
+@item --locale=@var{encoding}
+
+Force Wget to use @var{encoding} as the default system encoding. That affects
+how Wget converts URLs specified as arguments from locale to @sc{utf-8} for
+IRI support.
+
+Wget use the function @code{nl_langinfo()} and then the @code{CHARSET}
+environment variable to get the locale. If it fails, @sc{ascii} is used.
+
+You can set the default locale using the @code{locale} command in
+@file{.wgetrc}. That setting may be overridden from the command line.
+
@cindex progress indicator
@cindex dot style
@item --progress=@var{type}
``dot'' progress will be favored over ``bar''. To force the bar output,
use @samp{--progress=bar:force}.
+@cindex remote encoding
+@item --remote-encoding=@var{encoding}
+
+Force Wget to use encoding as the default remote server encoding. That
+affects how Wget converts URIs found in files from remote encoding to
+@sc{utf-8} during a recursive fetch. This options is only useful for
+IRI support, for the interpretation of non-@sc{ascii} characters.
+
+For HTTP, remote encoding can be found in HTTP @code{Content-Type}
+header and in HTML @code{Content-Type http-equiv} meta tag.
+
+You can set the default encoding using the @code{remoteencoding}
+command in @file{.wgetrc}. That setting may be overridden from the
+command line.
+
@item -N
@itemx --timestamping
Turn on time-stamping. @xref{Time-Stamping}, for details.
given file, then waiting 2 seconds after the second failure on that
file, up to the maximum number of @var{seconds} you specify. Therefore,
a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55
-seconds per file.
+seconds per file.
-Note that this option is turned on by default in the global
-@file{wgetrc} file.
+By default, Wget will assume a value of 10 seconds.
@cindex wait, random
@cindex random wait
when @samp{--password} is being used, because they are mutually exclusive.
@end table
-@node Directory Options
+@node Directory Options, HTTP Options, Download Options, Invoking
@section Directory Options
@table @samp
current directory).
@end table
-@node HTTP Options
+@node HTTP Options, HTTPS (SSL/TLS) Options, Directory Options, Invoking
@section HTTP Options
@table @samp
+@cindex default page name
+@cindex index.html
+@item --default-page=@var{name}
+Use @var{name} as the default file name when it isn't known (i.e., for
+URLs that end in a slash), instead of @file{index.html}.
+
@cindex .html extension
@item -E
@itemx --html-extension
Considerations}.
@end iftex
+@cindex Keep-Alive, turning off
+@cindex Persistent Connections, disabling
+@item --no-http-keep-alive
+Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget
+asks the server to keep the connection open so that, when you download
+more than one document from the same server, they get transferred over
+the same TCP connection. This saves time and at the same time reduces
+the load on the server.
+
+This option is useful when, for some reason, persistent (keep-alive)
+connections don't work for you, for example due to a server bug or due
+to the inability of server-side scripts to cope with the connections.
+
@cindex proxy
@cindex cache
@item --no-cache
@end table
-@node HTTPS (SSL/TLS) Options
+@node HTTPS (SSL/TLS) Options, FTP Options, HTTP Options, Invoking
@section HTTPS (SSL/TLS) Options
@cindex SSL
systems that support @file{/dev/random}.
@end table
-@node FTP Options
+@node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking
@section FTP Options
@table @samp
specified on the command-line, rather than because it was recursed to,
this option has no effect. Symbolic links are always traversed in this
case.
-
-@cindex Keep-Alive, turning off
-@cindex Persistent Connections, disabling
-@item --no-http-keep-alive
-Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget
-asks the server to keep the connection open so that, when you download
-more than one document from the same server, they get transferred over
-the same TCP connection. This saves time and at the same time reduces
-the load on the server.
-
-This option is useful when, for some reason, persistent (keep-alive)
-connections don't work for you, for example due to a server bug or due
-to the inability of server-side scripts to cope with the connections.
@end table
-@node Recursive Retrieval Options
+@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking
@section Recursive Retrieval Options
@table @samp
option to turn it on.
@end table
-@node Recursive Accept/Reject Options
+@node Recursive Accept/Reject Options, , Recursive Retrieval Options, Invoking
@section Recursive Accept/Reject Options
@table @samp
@c man end
-@node Recursive Download
+@node Recursive Download, Following Links, Invoking, Top
@chapter Recursive Download
@cindex recursion
@cindex retrieving
Recursive retrieval should be used with care. Don't say you were not
warned.
-@node Following Links
+@node Following Links, Time-Stamping, Recursive Download, Top
@chapter Following Links
@cindex links
@cindex following links
* FTP Links:: Following FTP links.
@end menu
-@node Spanning Hosts
+@node Spanning Hosts, Types of Files, Following Links, Following Links
@section Spanning Hosts
@cindex spanning hosts
@cindex hosts, spanning
@end table
-@node Types of Files
+@node Types of Files, Directory-Based Limits, Spanning Hosts, Following Links
@section Types of Files
@cindex types of files
This behavior, too, is considered less-than-desirable, and may change
in a future version of Wget.
-@node Directory-Based Limits
+@node Directory-Based Limits, Relative Links, Types of Files, Following Links
@section Directory-Based Limits
@cindex directories
@cindex directory limits
meaningless, as its parent is @samp{/}).
@end table
-@node Relative Links
+@node Relative Links, FTP Links, Directory-Based Limits, Following Links
@section Relative Links
@cindex relative links
This option is probably not very useful and might be removed in a future
release.
-@node FTP Links
+@node FTP Links, , Relative Links, Following Links
@section Following FTP Links
@cindex following ftp links
Also note that followed links to @sc{ftp} directories will not be
retrieved recursively further.
-@node Time-Stamping
+@node Time-Stamping, Startup File, Following Links, Top
@chapter Time-Stamping
@cindex time-stamping
@cindex timestamping
* FTP Time-Stamping Internals::
@end menu
-@node Time-Stamping Usage
+@node Time-Stamping Usage, HTTP Time-Stamping Internals, Time-Stamping, Time-Stamping
@section Time-Stamping Usage
@cindex time-stamping usage
@cindex usage, time-stamping
directory listing with dates in a format that Wget can parse
(@pxref{FTP Time-Stamping Internals}).
-@node HTTP Time-Stamping Internals
+@node HTTP Time-Stamping Internals, FTP Time-Stamping Internals, Time-Stamping Usage, Time-Stamping
@section HTTP Time-Stamping Internals
@cindex http time-stamping
Arguably, @sc{http} time-stamping should be implemented using the
@code{If-Modified-Since} request.
-@node FTP Time-Stamping Internals
+@node FTP Time-Stamping Internals, , HTTP Time-Stamping Internals, Time-Stamping
@section FTP Time-Stamping Internals
@cindex ftp time-stamping
@code{wu-ftpd}), which returns the exact time of the specified file.
Wget may support this command in the future.
-@node Startup File
+@node Startup File, Examples, Time-Stamping, Top
@chapter Startup File
@cindex startup file
@cindex wgetrc
* Sample Wgetrc:: A wgetrc example.
@end menu
-@node Wgetrc Location
+@node Wgetrc Location, Wgetrc Syntax, Startup File, Startup File
@section Wgetrc Location
@cindex wgetrc location
@cindex location of wgetrc
system-wide wgetrc (in @file{/usr/local/etc/wgetrc} by default).
Fascist admins, away!
-@node Wgetrc Syntax
+@node Wgetrc Syntax, Wgetrc Commands, Wgetrc Location, Startup File
@section Wgetrc Syntax
@cindex wgetrc syntax
@cindex syntax of wgetrc
reject =
@end example
-@node Wgetrc Commands
+@node Wgetrc Commands, Sample Wgetrc, Wgetrc Syntax, Startup File
@section Wgetrc Commands
@cindex wgetrc commands
@item debug = on/off
Debug mode, same as @samp{-d}.
+@item default_page = @var{string}
+Default page name---the same as @samp{--default-page=@var{string}}.
+
@item delete_after = on/off
Delete after download---the same as @samp{--delete-after}.
Save cookies to @var{file}. The same as @samp{--save-cookies
@var{file}}.
+@item save_headers = on/off
+Same as @samp{--save-headers}.
+
@item secure_protocol = @var{string}
Choose the secure protocol to be used. Legal values are @samp{auto}
(the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same
@item span_hosts = on/off
Same as @samp{-H}.
+@item spider = on/off
+Same as @samp{--spider}.
+
@item strict_comments = on/off
Same as @samp{--strict-comments}.
This command can be overridden using the @samp{ftp_user} and
@samp{http_user} command for @sc{ftp} and @sc{http} respectively.
+@item user_agent = @var{string}
+User agent identification sent to the HTTP Server---the same as
+@samp{--user-agent=@var{string}}.
+
@item verbose = on/off
Turn verbose on/off---the same as @samp{-v}/@samp{-nv}.
turned on by default in the global @file{wgetrc}.
@end table
-@node Sample Wgetrc
+@node Sample Wgetrc, , Wgetrc Commands, Startup File
@section Sample Wgetrc
@cindex sample wgetrc
@include sample.wgetrc.munged_for_texi_inclusion
@end example
-@node Examples
+@node Examples, Various, Startup File, Top
@chapter Examples
@cindex examples
* Very Advanced Usage:: The hairy stuff.
@end menu
-@node Simple Usage
+@node Simple Usage, Advanced Usage, Examples, Examples
@section Simple Usage
@itemize @bullet
@end example
@end itemize
-@node Advanced Usage
+@node Advanced Usage, Very Advanced Usage, Simple Usage, Examples
@section Advanced Usage
@itemize @bullet
@end example
@end itemize
-@node Very Advanced Usage
+@node Very Advanced Usage, , Advanced Usage, Examples
@section Very Advanced Usage
@cindex mirroring
@end itemize
@c man end
-@node Various
+@node Various, Appendices, Examples, Top
@chapter Various
@cindex various
* Proxies:: Support for proxy servers.
* Distribution:: Getting the latest version.
* Web Site:: GNU Wget's presence on the World Wide Web.
-* Mailing List:: Wget mailing list for announcements and discussion.
+* Mailing Lists:: Wget mailing list for announcements and discussion.
* Internet Relay Chat:: Wget's presence on IRC.
* Reporting Bugs:: How and where to report bugs.
* Portability:: The systems Wget works on.
* Signals:: Signal-handling performed by Wget.
@end menu
-@node Proxies
+@node Proxies, Distribution, Various, Various
@section Proxies
@cindex proxies
settings @code{proxy_user} and @code{proxy_password} to set the proxy
username and password.
-@node Distribution
+@node Distribution, Web Site, Proxies, Various
@section Distribution
@cindex latest version
Wget @value{VERSION} can be found at
@url{ftp://ftp.gnu.org/pub/gnu/wget/wget-@value{VERSION}.tar.gz}
-@node Web Site
+@node Web Site, Mailing Lists, Distribution, Various
@section Web Site
@cindex web site
information resides at ``The Wget Wgiki'',
@url{http://wget.addictivecode.org/}.
-@node Mailing List
-@section Mailing List
+@node Mailing Lists, Internet Relay Chat, Web Site, Various
+@section Mailing Lists
@cindex mailing list
@cindex list
-There are several Wget-related mailing lists. The general discussion
-list is at @email{wget@@sunsite.dk}. It is the preferred place for
-support requests and suggestions, as well as for discussion of
-development. You are invited to subscribe.
-
-To subscribe, simply send mail to @email{wget-subscribe@@sunsite.dk}
-and follow the instructions. Unsubscribe by mailing to
-@email{wget-unsubscribe@@sunsite.dk}. The mailing list is archived at
+@unnumberedsubsec Primary List
+
+The primary mailinglist for discussion, bug-reports, or questions
+about GNU Wget is at @email{bug-wget@@gnu.org}. To subscribe, send an
+email to @email{bug-wget-join@@gnu.org}, or visit
+@url{http://lists.gnu.org/mailman/listinfo/bug-wget}.
+
+You do not need to subscribe to send a message to the list; however,
+please note that unsubscribed messages are moderated, and may take a
+while before they hit the list---@strong{usually around a day}. If
+you want your message to show up immediately, please subscribe to the
+list before posting. Archives for the list may be found at
+@url{http://lists.gnu.org/pipermail/bug-wget/}.
+
+An NNTP/Usenettish gateway is also available via
+@uref{http://gmane.org/about.php,Gmane}. You can see the Gmane
+archives at
+@url{http://news.gmane.org/gmane.comp.web.wget.general}. Note that the
+Gmane archives conveniently include messages from both the current
+list, and the previous one. Messages also show up in the Gmane
+archives sooner than they do at @url{lists.gnu.org}.
+
+@unnumberedsubsec Bug Notices List
+
+Additionally, there is the @email{wget-notify@@addictivecode.org} mailing
+list. This is a non-discussion list that receives bug report
+notifications from the bug-tracker. To subscribe to this list,
+send an email to @email{wget-notify-join@@addictivecode.org},
+or visit @url{http://addictivecode.org/mailman/listinfo/wget-notify}.
+
+@unnumberedsubsec Obsolete Lists
+
+Previously, the mailing list @email{wget@@sunsite.dk} was used as the
+main discussion list, and another list,
+@email{wget-patches@@sunsite.dk} was used for submitting and
+discussing patches to GNU Wget.
+
+Messages from @email{wget@@sunsite.dk} are archived at
+@itemize @tie{}
+@item
@url{http://www.mail-archive.com/wget%40sunsite.dk/} and at
-@url{http://news.gmane.org/gmane.comp.web.wget.general}.
-
-Another mailing list is at @email{wget-patches@@sunsite.dk}, and is
-used to submit patches for review by Wget developers. A ``patch'' is
-a textual representation of change to source code, readable by both
-humans and programs. The
-@url{http://wget.addictivecode.org/PatchGuidelines} page
-covers the creation and submitting of patches in detail. Please don't
-send general suggestions or bug reports to @samp{wget-patches}; use it
-only for patch submissions.
-
-Subscription is the same as above for @email{wget@@sunsite.dk}, except
-that you send to @email{wget-patches-subscribe@@sunsite.dk}, instead.
-The mailing list is archived at
-@url{http://news.gmane.org/gmane.comp.web.wget.patches}.
+@item
+@url{http://news.gmane.org/gmane.comp.web.wget.general} (which also
+continues to archive the current list, @email{bug-wget@@gnu.org}).
+@end itemize
-Finally, there is the @email{wget-notify@@addictivecode.org} mailing
-list. This is a non-discussion list that receives bug report-change
-notifications from the bug-tracker. Unlike for the other mailing lists,
-subscription is through the @code{mailman} interface at
-@url{http://addictivecode.org/mailman/listinfo/wget-notify}.
+Messages from @email{wget-patches@@sunsite.dk} are archived at
+@itemize @tie{}
+@item
+@url{http://news.gmane.org/gmane.comp.web.wget.patches}.
+@end itemize
-@node Internet Relay Chat
+@node Internet Relay Chat, Reporting Bugs, Mailing Lists, Various
@section Internet Relay Chat
@cindex Internet Relay Chat
@cindex IRC
In addition to the mailinglists, we also have a support channel set up
via IRC at @code{irc.freenode.org}, @code{#wget}. Come check it out!
-@node Reporting Bugs
+@node Reporting Bugs, Portability, Internet Relay Chat, Various
@section Reporting Bugs
@cindex bugs
@cindex reporting bugs
it's a bug. If things work strange, but you are not sure about the way
they are supposed to work, it might well be a bug, but you might want to
double-check the documentation and the mailing lists (@pxref{Mailing
-List}).
+Lists}).
@item
Try to repeat the bug in as simple circumstances as possible. E.g. if
@end enumerate
@c man end
-@node Portability
+@node Portability, Signals, Reporting Bugs, Various
@section Portability
@cindex portability
@cindex operating systems
Vanem; a port to VMS is maintained by Steven Schweda, and is available
at @url{http://antinode.org/}.
-@node Signals
+@node Signals, , Portability, Various
@section Signals
@cindex signal handling
@cindex hangup
Other than that, Wget will not try to interfere with signals in any way.
@kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it alike.
-@node Appendices
+@node Appendices, Copying this manual, Various, Top
@chapter Appendices
This chapter contains some references I consider useful.
* Contributors:: People who helped.
@end menu
-@node Robot Exclusion
+@node Robot Exclusion, Security Considerations, Appendices, Appendices
@section Robot Exclusion
@cindex robot exclusion
@cindex robots.txt
download and parse.
Although Wget is not a web robot in the strictest sense of the word, it
-can downloads large parts of the site without the user's intervention to
+can download large parts of the site without the user's intervention to
download an individual page. Because of that, Wget honors RES when
downloading recursively. For instance, when you issue:
@file{.wgetrc}. You can achieve the same effect from the command line
using the @code{-e} switch, e.g. @samp{wget -e robots=off @var{url}...}.
-@node Security Considerations
+@node Security Considerations, Contributors, Robot Exclusion, Appendices
@section Security Considerations
@cindex security
me).
@end enumerate
-@node Contributors
+@node Contributors, , Security Considerations, Appendices
@section Contributors
@cindex contributors
@item
Ted Mielczarek---donated support for CSS.
+@item
+Saint Xavier---Support for IRIs (RFC 3987).
+
@item
People who provided donations for development---including Brian Gough.
@end itemize
Alexander Kourakos,
Martin Kraemer,
Sami Krank,
+Jay Krell,
@tex
$\Sigma\acute{\iota}\mu o\varsigma\;
\Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$
Matthew J.@: Mellon,
Jordan Mendelson,
Ted Mielczarek,
+Robert Millan,
Lin Zhe Min,
Jan Minar,
Tim Mooney,
Douglas E.@: Wegscheid,
Ralf Wildenhues,
Joshua David Williams,
+Benjamin Wolsey,
+Saint Xavier,
YAMAZAKI Makoto,
Jasmin Zainul,
@iftex
@ifnottex
Bojan Zdrnja,
@end ifnottex
-Kristijan Zimmer.
+Kristijan Zimmer,
+Xin Zou.
Apologies to all who I accidentally left out, and many thanks to all the
subscribers of the Wget mailing list.
-@node Copying this manual
+@node Copying this manual, Concept Index, Appendices, Top
@appendix Copying this manual
@menu
* GNU Free Documentation License:: Licnse for copying this manual.
@end menu
+@node GNU Free Documentation License, , Copying this manual, Copying this manual
+@appendixsec GNU Free Documentation License
+@cindex FDL, GNU Free Documentation License
+
@include fdl.texi
-@node Concept Index
+@node Concept Index, , Copying this manual, Top
@unnumbered Concept Index
@printindex cp