@c %**start of header
@setfilename wget.info
@include version.texi
-@set UPDATED Mar 2008
@settitle GNU Wget @value{VERSION} Manual
@c Disable the monstrous rectangles beside overfull hbox-es.
@finalout
@c man end
@end ignore
@c man begin DESCRIPTION
-Wget can follow links in @sc{html} and @sc{xhtml} pages and create local
-versions of remote web sites, fully recreating the directory structure of
-the original site. This is sometimes referred to as ``recursive
-downloading.'' While doing that, Wget respects the Robot Exclusion
-Standard (@file{/robots.txt}). Wget can be instructed to convert the
-links in downloaded @sc{html} files to the local files for offline
-viewing.
+Wget can follow links in @sc{html}, @sc{xhtml}, and @sc{css} pages, to
+create local versions of remote web sites, fully recreating the
+directory structure of the original site. This is sometimes referred to
+as ``recursive downloading.'' While doing that, Wget respects the Robot
+Exclusion Standard (@file{/robots.txt}). Wget can be instructed to
+convert the links in downloaded files to point at the local files, for
+offline viewing.
@c man end
@item
@cindex input-file
@item -i @var{file}
@itemx --input-file=@var{file}
-Read @sc{url}s from @var{file}. If @samp{-} is specified as
-@var{file}, @sc{url}s are read from the standard input. (Use
-@samp{./-} to read from a file literally named @samp{-}.)
+Read @sc{url}s from a local or external @var{file}. If @samp{-} is
+specified as @var{file}, @sc{url}s are read from the standard input.
+(Use @samp{./-} to read from a file literally named @samp{-}.)
If this function is used, no @sc{url}s need be present on the command
line. If there are @sc{url}s both on the command line and in an input
href="@var{url}">} to the documents or by specifying
@samp{--base=@var{url}} on the command line.
+If the @var{file} is an external one, the document will be automatically
+treated as @samp{html} if the Content-Type matches @samp{text/html}.
+Furthermore, the @var{file}'s location will be implicitly used as base
+href if none was specified.
+
@cindex force html
@item -F
@itemx --force-html
same time. Neither option is available in Wget compiled without IPv6
support.
-@item --prefer-family=IPv4/IPv6/none
+@item --prefer-family=none/IPv4/IPv6
When given a choice of several addresses, connect to the addresses
-with specified address family first. IPv4 addresses are preferred by
-default.
+with specified address family first. The address order returned by
+DNS is used without change by default.
This avoids spurious errors and connect attempts when accessing hosts
that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For
@section HTTP Options
@table @samp
+@cindex default page name
+@cindex index.html
+@item --default-page=@var{name}
+Use @var{name} as the default file name when it isn't known (i.e., for
+URLs that end in a slash), instead of @file{index.html}.
+
@cindex .html extension
@item -E
@itemx --html-extension
version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive
Retrieval Options}).
+As of version 1.12, Wget will also ensure that any downloaded files of
+type @samp{text/css} end in the suffix @samp{.css}. Obviously, this
+makes the name @samp{--html-extension} misleading; a better name is
+expected to be offered as an alternative in the near future.
+
@cindex http user
@cindex http password
@cindex authentication
@sc{http} or @sc{ftp} server), following links and directory structure.
We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.
-With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} from
-the given @sc{url}, documents, retrieving the files the @sc{html}
-document was referring to, through markup like @code{href}, or
-@code{src}. If the freshly downloaded file is also of type
-@code{text/html} or @code{application/xhtml+xml}, it will be parsed and
-followed further.
+With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} or
+@sc{css} from the given @sc{url}, retrieving the files the document
+refers to, through markup like @code{href} or @code{src}, or @sc{css}
+@sc{uri} values specified using the @samp{url()} functional notation.
+If the freshly downloaded file is also of type @code{text/html},
+@code{application/xhtml+xml}, or @code{text/css}, it will be parsed
+and followed further.
-Recursive retrieval of @sc{http} and @sc{html} content is
+Recursive retrieval of @sc{http} and @sc{html}/@sc{css} content is
@dfn{breadth-first}. This means that Wget first downloads the requested
-@sc{html} document, then the documents linked from that document, then the
+document, then the documents linked from that document, then the
documents linked by them, and so on. In other words, Wget first
downloads the documents at depth 1, then those at depth 2, and so on
until the specified maximum depth.
@item debug = on/off
Debug mode, same as @samp{-d}.
+@item default_page = @var{string}
+Default page name---the same as @samp{--default-page=@var{string}}.
+
@item delete_after = on/off
Delete after download---the same as @samp{--delete-after}.
@item html_extension = on/off
Add a @samp{.html} extension to @samp{text/html} or
-@samp{application/xhtml+xml} files without it, like @samp{-E}.
+@samp{application/xhtml+xml} files without it, or a @samp{.css}
+extension to @samp{text/css} files without it, like @samp{-E}.
@item http_keep_alive = on/off
Turn the keep-alive feature on or off (defaults to on). Turning it
@var{file} in the request body. The same as
@samp{--post-file=@var{file}}.
-@item prefer_family = IPv4/IPv6/none
+@item prefer_family = none/IPv4/IPv6
When given a choice of several addresses, connect to the addresses
-with specified address family first. IPv4 addresses are preferred by
-default. The same as @samp{--prefer-family}, which see for a detailed
-discussion of why this is useful.
+with specified address family first. The address order returned by
+DNS is used without change by default. The same as @samp{--prefer-family},
+which see for a detailed discussion of why this is useful.
@item private_key = @var{file}
Set the private key file to @var{file}. The same as
Save cookies to @var{file}. The same as @samp{--save-cookies
@var{file}}.
+@item save_headers = on/off
+Same as @samp{--save-headers}.
+
@item secure_protocol = @var{string}
Choose the secure protocol to be used. Legal values are @samp{auto}
(the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same
@item span_hosts = on/off
Same as @samp{-H}.
+@item spider = on/off
+Same as @samp{--spider}.
+
@item strict_comments = on/off
Same as @samp{--strict-comments}.
This command can be overridden using the @samp{ftp_user} and
@samp{http_user} command for @sc{ftp} and @sc{http} respectively.
+@item user_agent = @var{string}
+User agent identification sent to the HTTP Server---the same as
+@samp{--user-agent=@var{string}}.
+
@item verbose = on/off
Turn verbose on/off---the same as @samp{-v}/@samp{-nv}.
@end example
@item
-The same as the above, but convert the links in the @sc{html} files to
+The same as the above, but convert the links in the downloaded files to
point to local files, so you can view the documents off-line:
@example
@url{http://news.gmane.org/gmane.comp.web.wget.patches}.
Finally, there is the @email{wget-notify@@addictivecode.org} mailing
-list. This is a non-discussion list that receives commit notifications
-from the source repository, and also bug report-change notifications.
-This is the highest-traffic list for Wget, and is recommended only for
-people who are seriously interested in ongoing Wget development.
-Subscription is through the @code{mailman} interface at
+list. This is a non-discussion list that receives bug report-change
+notifications from the bug-tracker. Unlike for the other mailing lists,
+subscription is through the @code{mailman} interface at
@url{http://addictivecode.org/mailman/listinfo/wget-notify}.
@node Internet Relay Chat
@cindex IRC
@cindex #wget
-While, at the time of this writing, there is very low activity, we do
-have a support channel set up via IRC at @code{irc.freenode.org},
-@code{#wget}. Come check it out!
+In addition to the mailinglists, we also have a support channel set up
+via IRC at @code{irc.freenode.org}, @code{#wget}. Come check it out!
@node Reporting Bugs
@section Reporting Bugs
download and parse.
Although Wget is not a web robot in the strictest sense of the word, it
-can downloads large parts of the site without the user's intervention to
+can download large parts of the site without the user's intervention to
download an individual page. Because of that, Wget honors RES when
downloading recursively. For instance, when you issue:
authentication.
@item
-Mauro Tortonesi---Improved IPv6 support, adding support for dual
+Mauro Tortonesi---improved IPv6 support, adding support for dual
family systems. Refactored and enhanced FTP IPv6 code. Maintained GNU
Wget from 2004--2007.
@item
-Christopher G.@: Lewis---Maintenance of the Windows version of GNU WGet.
+Christopher G.@: Lewis---maintenance of the Windows version of GNU WGet.
@item
-Gisle Vanem---Many helpful patches and improvements, especially for
+Gisle Vanem---many helpful patches and improvements, especially for
Windows and MS-DOS support.
@item
-Ralf Wildenhues---Contributed patches to convert Wget to use Automake as
+Ralf Wildenhues---contributed patches to convert Wget to use Automake as
part of its build process, and various bugfixes.
+@item
+Steven Schubiger---Many helpful patches, bugfixes and improvements.
+Notably, conversion of Wget to use the Gnulib quotes and quoteargs
+modules, and the addition of password prompts at the console, via the
+Gnulib getpasswd-gnu module.
+
+@item
+Ted Mielczarek---donated support for CSS.
+
@item
People who provided donations for development---including Brian Gough.
@end itemize
Aleksandar Erkalovic,
@end ifnottex
Andy Eskilsson,
+@iftex
+Jo@~{a}o Ferreira,
+@end iftex
+@ifnottex
+Joao Ferreira,
+@end ifnottex
Christian Fraenkel,
David Fritz,
+Mike Frysinger,
Charles C.@: Fu,
FUJISHIMA Satsuki,
Masashi Fujita,
Marcel Gerrits,
Lemble Gregory,
Hans Grobler,
+Alain Guibert,
Mathieu Guillaume,
Aaron Hawley,
Jochen Hein,
Karl Heuer,
+Madhusudan Hosaagrahara,
HIROSE Masaaki,
Ulf Harnhammar,
Gregor Hoffleit,
Aurelien Marchand,
Matthew J.@: Mellon,
Jordan Mendelson,
+Ted Mielczarek,
Lin Zhe Min,
Jan Minar,
Tim Mooney,
Simon Munton,
Charlie Negyesi,
R.@: K.@: Owen,
+Jim Paris,
+Kenny Parnell,
Leonid Petrov,
Simone Piunno,
Andrew Pollock,
Heinz Salzmann,
Robert Schmidt,
Nicolas Schodet,
+Benno Schulenberg,
Andreas Schwab,
Steven M.@: Schweda,
Chris Seawood,
+Pranab Shenoy,
Dennis Smit,
Toomas Soome,
Tage Stabell-Kulo,