X-Git-Url: http://sjero.net/git/?p=wget;a=blobdiff_plain;f=doc%2Fwget.texi;h=ee7a873cee0b493b19d536516a3e189c99efdf92;hp=54e2eb9d192eb24ded375098b657509f4aae7726;hb=9dadbf6fe9577a6a6b7e7bab4e4b782fc1a6f86c;hpb=cbd54b549a49465d4545c136d8061f6ef3889a30 diff --git a/doc/wget.texi b/doc/wget.texi index 54e2eb9d..ee7a873c 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -82,7 +82,7 @@ Info entry for @file{wget}. @contents @ifnottex -@node Top +@node Top, Overview, (dir), (dir) @top Wget @value{VERSION} @insertcopying @@ -102,7 +102,7 @@ Info entry for @file{wget}. * Concept Index:: Topics covered by this manual. @end menu -@node Overview +@node Overview, Invoking, Top, Top @chapter Overview @cindex overview @cindex features @@ -211,7 +211,7 @@ Public License, as published by the Free Software Foundation (see the file @file{COPYING} that came with GNU Wget, for details). @end itemize -@node Invoking +@node Invoking, Recursive Download, Overview, Top @chapter Invoking @cindex invoking @cindex command line @@ -248,7 +248,7 @@ the command line. * Recursive Accept/Reject Options:: @end menu -@node URL Format +@node URL Format, Option Syntax, Invoking, Invoking @section URL Format @cindex URL @cindex URL syntax @@ -326,7 +326,7 @@ with your favorite browser, like @code{Lynx} or @code{Netscape}. @c man begin OPTIONS -@node Option Syntax +@node Option Syntax, Basic Startup Options, URL Format, Invoking @section Option Syntax @cindex option syntax @cindex syntax of options @@ -396,12 +396,12 @@ the option name; negative options can be negated by omitting the @samp{--no-} prefix. This might seem superfluous---if the default for an affirmative option is to not do something, then why provide a way to explicitly turn it off? But the startup file may in fact change -the default. For instance, using @code{follow_ftp = off} in -@file{.wgetrc} makes Wget @emph{not} follow FTP links by default, and +the default. For instance, using @code{follow_ftp = on} in +@file{.wgetrc} makes Wget @emph{follow} FTP links by default, and using @samp{--no-follow-ftp} is the only way to restore the factory default from the command line. -@node Basic Startup Options +@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking @section Basic Startup Options @table @samp @@ -429,7 +429,7 @@ instances of @samp{-e}. @end table -@node Logging and Input File Options +@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking @section Logging and Input File Options @table @samp @@ -486,9 +486,8 @@ specified as @var{file}, @sc{url}s are read from the standard input. If this function is used, no @sc{url}s need be present on the command line. If there are @sc{url}s both on the command line and in an input file, those on the command lines will be the first ones to be -retrieved. The @var{file} need not be an @sc{html} document (but no -harm if it is)---it is enough if the @sc{url}s are just listed -sequentially. +retrieved. If @samp{--force-html} is not specified, then @var{file} +should consist of a series of URLs, one per line. However, if you specify @samp{--force-html}, the document will be regarded as @samp{html}. In that case you may have problems with @@ -513,11 +512,20 @@ option. @cindex base for relative links in input file @item -B @var{URL} @itemx --base=@var{URL} -Prepends @var{URL} to relative links read from the file specified with -the @samp{-i} option. +Resolves relative links using @var{URL} as the point of reference, +when reading links from an HTML file specified via the +@samp{-i}/@samp{--input-file} option (together with +@samp{--force-html}, or when the input file was fetched remotely from +a server describing it as @sc{html}). This is equivalent to the +presence of a @code{BASE} tag in the @sc{html} input file, with +@var{URL} as the value for the @code{href} attribute. + +For instance, if you specify @samp{http://foo/bar/a.html} for +@var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it +would be resolved to @samp{http://foo/baz/b.html}. @end table -@node Download Options +@node Download Options, Directory Options, Logging and Input File Options, Invoking @section Download Options @table @samp @@ -582,23 +590,24 @@ behavior depends on a few options, including @samp{-nc}. In certain cases, the local file will be @dfn{clobbered}, or overwritten, upon repeated download. In other cases it will be preserved. -When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or @samp{p}, -downloading the same file in the same directory will result in the -original copy of @var{file} being preserved and the second copy being -named @samp{@var{file}.1}. If that file is downloaded yet again, the -third copy will be named @samp{@var{file}.2}, and so on. When -@samp{-nc} is specified, this behavior is suppressed, and Wget will -refuse to download newer copies of @samp{@var{file}}. Therefore, -``@code{no-clobber}'' is actually a misnomer in this mode---it's not -clobbering that's prevented (as the numeric suffixes were already -preventing clobbering), but rather the multiple version saving that's -prevented. - -When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N} -or @samp{-nc}, re-downloading a file will result in the new copy -simply overwriting the old. Adding @samp{-nc} will prevent this -behavior, instead causing the original version to be preserved and any -newer copies on the server to be ignored. +When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or +@samp{-p}, downloading the same file in the same directory will result +in the original copy of @var{file} being preserved and the second copy +being named @samp{@var{file}.1}. If that file is downloaded yet +again, the third copy will be named @samp{@var{file}.2}, and so on. +(This is also the behavior with @samp{-nd}, even if @samp{-r} or +@samp{-p} are in effect.) When @samp{-nc} is specified, this behavior +is suppressed, and Wget will refuse to download newer copies of +@samp{@var{file}}. Therefore, ``@code{no-clobber}'' is actually a +misnomer in this mode---it's not clobbering that's prevented (as the +numeric suffixes were already preventing clobbering), but rather the +multiple version saving that's prevented. + +When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N}, +@samp{-nd}, or @samp{-nc}, re-downloading a file will result in the +new copy simply overwriting the old. Adding @samp{-nc} will prevent +this behavior, instead causing the original version to be preserved +and any newer copies on the server to be ignored. When running Wget with @samp{-N}, with or without @samp{-r} or @samp{-p}, the decision as to whether or not to download a newer copy @@ -674,30 +683,6 @@ Another instance where you'll get a garbled file if you try to use Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http} servers that support the @code{Range} header. -@cindex iri support -@cindex idn support -@item --iri - -Turn on internationalized URI (IRI) support. Use @samp{--iri=no} to -turn it off. IRI support is activated by default. - -You can set the default state of IRI support using @code{iri} command in -@file{.wgetrc}. That setting may be overridden from the command line. - -@cindex local encoding -@cindex locale -@item --locale=@var{encoding} - -Force Wget to use @var{encoding} as the default system encoding. That affects -how Wget converts URLs specified as arguments from locale to @sc{utf-8} for -IRI support. - -Wget use the function @code{nl_langinfo()} and then the @code{CHARSET} -environment variable to get the locale. If it fails, @sc{ascii} is used. - -You can set the default locale using the @code{locale} command in -@file{.wgetrc}. That setting may be overridden from the command line. - @cindex progress indicator @cindex dot style @item --progress=@var{type} @@ -729,21 +714,6 @@ command line. The exception is that, when the output is not a TTY, the ``dot'' progress will be favored over ``bar''. To force the bar output, use @samp{--progress=bar:force}. -@cindex remote encoding -@item --remote-encoding=@var{encoding} - -Force Wget to use encoding as the default remote server encoding. That -affects how Wget converts URIs found in files from remote encoding to -@sc{utf-8} during a recursive fetch. This options is only useful for -IRI support, for the interpretation of non-@sc{ascii} characters. - -For HTTP, remote encoding can be found in HTTP @code{Content-Type} -header and in HTML @code{Content-Type http-equiv} meta tag. - -You can set the default encoding using the @code{remoteencoding} -command in @file{.wgetrc}. That setting may be overridden from the -command line. - @item -N @itemx --timestamping Turn on time-stamping. @xref{Time-Stamping}, for details. @@ -861,10 +831,9 @@ use @dfn{linear backoff}, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of @var{seconds} you specify. Therefore, a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55 -seconds per file. +seconds per file. -Note that this option is turned on by default in the global -@file{wgetrc} file. +By default, Wget will assume a value of 10 seconds. @cindex wait, random @cindex random wait @@ -935,24 +904,36 @@ won't need it. @cindex file names, restrict @cindex Windows file names -@item --restrict-file-names=@var{mode} -Change which characters found in remote URLs may show up in local file -names generated from those URLs. Characters that are @dfn{restricted} +@item --restrict-file-names=@var{modes} +Change which characters found in remote URLs must be escaped during +generation of local filenames. Characters that are @dfn{restricted} by this option are escaped, i.e. replaced with @samp{%HH}, where @samp{HH} is the hexadecimal number that corresponds to the restricted -character. - -By default, Wget escapes the characters that are not valid as part of -file names on your operating system, as well as control characters that -are typically unprintable. This option is useful for changing these -defaults, either because you are downloading to a non-native partition, -or because you want to disable escaping of the control characters. - -When mode is set to ``unix'', Wget escapes the character @samp{/} and +character. This option may also be used to force all alphabetical +cases to be either lower- or uppercase. + +By default, Wget escapes the characters that are not valid or safe as +part of file names on your operating system, as well as control +characters that are typically unprintable. This option is useful for +changing these defaults, perhaps because you are downloading to a +non-native partition, or because you want to disable escaping of the +control characters, or you want to further restrict characters to only +those in the @sc{ascii} range of values. + +The @var{modes} are a comma-separated set of text values. The +acceptable values are @samp{unix}, @samp{windows}, @samp{nocontrol}, +@samp{ascii}, @samp{lowercase}, and @samp{uppercase}. The values +@samp{unix} and @samp{windows} are mutually exclusive (one will +override the other), as are @samp{lowercase} and +@samp{uppercase}. Those last are special cases, as they do not change +the set of characters that would be escaped, but rather force local +file paths to be converted either to lower- or uppercase. + +When ``unix'' is specified, Wget escapes the character @samp{/} and the control characters in the ranges 0--31 and 128--159. This is the -default on Unix-like OS'es. +default on Unix-like operating systems. -When mode is set to ``windows'', Wget escapes the characters @samp{\}, +When ``windows'' is given, Wget escapes the characters @samp{\}, @samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<}, @samp{>}, and the control characters in the ranges 0--31 and 128--159. In addition to this, Wget in Windows mode uses @samp{+} instead of @@ -963,11 +944,17 @@ name from the rest. Therefore, a URL that would be saved as saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows mode. This mode is the default on Windows. -If you append @samp{,nocontrol} to the mode, as in -@samp{unix,nocontrol}, escaping of the control characters is also -switched off. You can use @samp{--restrict-file-names=nocontrol} to -turn off escaping of control characters without affecting the choice of -the OS to use as file name restriction mode. +If you specify @samp{nocontrol}, then the escaping of the control +characters is also switched off. This option may make sense +when you are downloading URLs whose names contain UTF-8 characters, on +a system which can save and display filenames in UTF-8 (some possible +byte values used in UTF-8 byte sequences fall in the range of values +designated by Wget as ``controls''). + +The @samp{ascii} mode is used to specify that any bytes whose values +are outside the range of @sc{ascii} characters (that is, greater than +127) shall be escaped. This can be useful when saving filenames +whose encoding does not match the one used locally. @cindex IPv6 @itemx -4 @@ -1036,9 +1023,49 @@ options for @sc{http} connections. @item --ask-password Prompt for a password for each connection established. Cannot be specified when @samp{--password} is being used, because they are mutually exclusive. + +@cindex iri support +@cindex idn support +@item --no-iri + +Turn off internationalized URI (IRI) support. Use @samp{--iri} to +turn it on. IRI support is activated by default. + +You can set the default state of IRI support using the @code{iri} +command in @file{.wgetrc}. That setting may be overridden from the +command line. + +@cindex local encoding +@item --local-encoding=@var{encoding} + +Force Wget to use @var{encoding} as the default system encoding. That affects +how Wget converts URLs specified as arguments from locale to @sc{utf-8} for +IRI support. + +Wget use the function @code{nl_langinfo()} and then the @code{CHARSET} +environment variable to get the locale. If it fails, @sc{ascii} is used. + +You can set the default local encoding using the @code{local_encoding} +command in @file{.wgetrc}. That setting may be overridden from the +command line. + +@cindex remote encoding +@item --remote-encoding=@var{encoding} + +Force Wget to use @var{encoding} as the default remote server encoding. +That affects how Wget converts URIs found in files from remote encoding +to @sc{utf-8} during a recursive fetch. This options is only useful for +IRI support, for the interpretation of non-@sc{ascii} characters. + +For HTTP, remote encoding can be found in HTTP @code{Content-Type} +header and in HTML @code{Content-Type http-equiv} meta tag. + +You can set the default encoding using the @code{remoteencoding} +command in @file{.wgetrc}. That setting may be overridden from the +command line. @end table -@node Directory Options +@node Directory Options, HTTP Options, Download Options, Invoking @section Directory Options @table @samp @@ -1110,7 +1137,7 @@ i.e. the top of the retrieval tree. The default is @samp{.} (the current directory). @end table -@node HTTP Options +@node HTTP Options, HTTPS (SSL/TLS) Options, Directory Options, Invoking @section HTTP Options @table @samp @@ -1121,8 +1148,9 @@ Use @var{name} as the default file name when it isn't known (i.e., for URLs that end in a slash), instead of @file{index.html}. @cindex .html extension +@cindex .css extension @item -E -@itemx --html-extension +@itemx --adjust-extension If a file of type @samp{application/xhtml+xml} or @samp{text/html} is downloaded and the URL does not end with the regexp @samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html} @@ -1143,9 +1171,14 @@ version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive Retrieval Options}). As of version 1.12, Wget will also ensure that any downloaded files of -type @samp{text/css} end in the suffix @samp{.css}. Obviously, this -makes the name @samp{--html-extension} misleading; a better name is -expected to be offered as an alternative in the near future. +type @samp{text/css} end in the suffix @samp{.css}, and the option was +renamed from @samp{--html-extension}, to better reflect its new +behavior. The old option name is still acceptable, but should now be +considered deprecated. + +At some point in the future, this option may well be expanded to +include suffixes for other types of content, including content types +that are not parsed by Wget. @cindex http user @cindex http password @@ -1170,6 +1203,19 @@ For more information about security issues with Wget, @xref{Security Considerations}. @end iftex +@cindex Keep-Alive, turning off +@cindex Persistent Connections, disabling +@item --no-http-keep-alive +Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget +asks the server to keep the connection open so that, when you download +more than one document from the same server, they get transferred over +the same TCP connection. This saves time and at the same time reduces +the load on the server. + +This option is useful when, for some reason, persistent (keep-alive) +connections don't work for you, for example due to a server bug or due +to the inability of server-side scripts to cope with the connections. + @cindex proxy @cindex cache @item --no-cache @@ -1373,10 +1419,20 @@ not to send the @code{User-Agent} header in @sc{http} requests. @cindex POST @item --post-data=@var{string} @itemx --post-file=@var{file} -Use POST as the method for all HTTP requests and send the specified data -in the request body. @code{--post-data} sends @var{string} as data, -whereas @code{--post-file} sends the contents of @var{file}. Other than -that, they work in exactly the same way. +Use POST as the method for all HTTP requests and send the specified +data in the request body. @samp{--post-data} sends @var{string} as +data, whereas @samp{--post-file} sends the contents of @var{file}. +Other than that, they work in exactly the same way. In particular, +they @emph{both} expect content of the form @code{key1=value1&key2=value2}, +with percent-encoding for special characters; the only difference is +that one expects its content as a command-line paramter and the other +accepts its content from a file. In particular, @samp{--post-file} is +@emph{not} for transmitting files as form attachments: those must +appear as @code{key=value} data (with appropriate percent-coding) just +like everything else. Wget does not currently support +@code{multipart/form-data} for transmitting POST data; only +@code{application/x-www-form-urlencoded}. Only one of +@samp{--post-data} and @samp{--post-file} should be specified. Please be aware that Wget needs to know the size of the POST data in advance. Therefore the argument to @code{--post-file} must be a regular @@ -1444,7 +1500,7 @@ form-based authentication. @end table -@node HTTPS (SSL/TLS) Options +@node HTTPS (SSL/TLS) Options, FTP Options, HTTP Options, Invoking @section HTTPS (SSL/TLS) Options @cindex SSL @@ -1569,7 +1625,7 @@ not used), EGD is never contacted. EGD is not needed on modern Unix systems that support @file{/dev/random}. @end table -@node FTP Options +@node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking @section FTP Options @table @samp @@ -1672,22 +1728,9 @@ Note that when retrieving a file (not a directory) because it was specified on the command-line, rather than because it was recursed to, this option has no effect. Symbolic links are always traversed in this case. - -@cindex Keep-Alive, turning off -@cindex Persistent Connections, disabling -@item --no-http-keep-alive -Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget -asks the server to keep the connection open so that, when you download -more than one document from the same server, they get transferred over -the same TCP connection. This saves time and at the same time reduces -the load on the server. - -This option is useful when, for some reason, persistent (keep-alive) -connections don't work for you, for example due to a server bug or due -to the inability of server-side scripts to cope with the connections. @end table -@node Recursive Retrieval Options +@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking @section Recursive Retrieval Options @table @samp @@ -1892,7 +1935,7 @@ If, for whatever reason, you want strict comment parsing, use this option to turn it on. @end table -@node Recursive Accept/Reject Options +@node Recursive Accept/Reject Options, , Recursive Retrieval Options, Invoking @section Recursive Accept/Reject Options @table @samp @@ -1987,7 +2030,7 @@ This is a useful option, since it guarantees that only the files @c man end -@node Recursive Download +@node Recursive Download, Following Links, Invoking, Top @chapter Recursive Download @cindex recursion @cindex retrieving @@ -2055,7 +2098,7 @@ about this. Recursive retrieval should be used with care. Don't say you were not warned. -@node Following Links +@node Following Links, Time-Stamping, Recursive Download, Top @chapter Following Links @cindex links @cindex following links @@ -2079,7 +2122,7 @@ links it will follow. * FTP Links:: Following FTP links. @end menu -@node Spanning Hosts +@node Spanning Hosts, Types of Files, Following Links, Following Links @section Spanning Hosts @cindex spanning hosts @cindex hosts, spanning @@ -2136,7 +2179,7 @@ wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \ @end table -@node Types of Files +@node Types of Files, Directory-Based Limits, Spanning Hosts, Following Links @section Types of Files @cindex types of files @@ -2227,7 +2270,7 @@ ways, all of which can change whether an accept/reject rule matches: If the local file already exists and @samp{--no-directories} was specified, a numeric suffix will be appended to the original name. @item -If @samp{--html-extension} was specified, the local filename will have +If @samp{--adjust-extension} was specified, the local filename might have @samp{.html} appended to it. If Wget is invoked with @samp{-E -A.php}, a filename such as @samp{index.php} will match be accepted, but upon download will be named @samp{index.php.html}, which no longer matches, @@ -2241,7 +2284,7 @@ local filenames, and so @emph{do} contribute to filename matching. This behavior, too, is considered less-than-desirable, and may change in a future version of Wget. -@node Directory-Based Limits +@node Directory-Based Limits, Relative Links, Types of Files, Following Links @section Directory-Based Limits @cindex directories @cindex directory limits @@ -2325,7 +2368,7 @@ directory, while in @samp{http://foo/bar} (no trailing slash), meaningless, as its parent is @samp{/}). @end table -@node Relative Links +@node Relative Links, FTP Links, Directory-Based Limits, Following Links @section Relative Links @cindex relative links @@ -2354,7 +2397,7 @@ to ``just work'' without having to convert links. This option is probably not very useful and might be removed in a future release. -@node FTP Links +@node FTP Links, , Relative Links, Following Links @section Following FTP Links @cindex following ftp links @@ -2374,7 +2417,7 @@ effect on such downloads. On the other hand, domain acceptance Also note that followed links to @sc{ftp} directories will not be retrieved recursively further. -@node Time-Stamping +@node Time-Stamping, Startup File, Following Links, Top @chapter Time-Stamping @cindex time-stamping @cindex timestamping @@ -2424,7 +2467,7 @@ say. * FTP Time-Stamping Internals:: @end menu -@node Time-Stamping Usage +@node Time-Stamping Usage, HTTP Time-Stamping Internals, Time-Stamping, Time-Stamping @section Time-Stamping Usage @cindex time-stamping usage @cindex usage, time-stamping @@ -2480,7 +2523,7 @@ gives a timestamp. For @sc{http}, this depends on getting a directory listing with dates in a format that Wget can parse (@pxref{FTP Time-Stamping Internals}). -@node HTTP Time-Stamping Internals +@node HTTP Time-Stamping Internals, FTP Time-Stamping Internals, Time-Stamping Usage, Time-Stamping @section HTTP Time-Stamping Internals @cindex http time-stamping @@ -2512,7 +2555,7 @@ with @samp{-N}, server file @samp{@var{X}} is compared to local file Arguably, @sc{http} time-stamping should be implemented using the @code{If-Modified-Since} request. -@node FTP Time-Stamping Internals +@node FTP Time-Stamping Internals, , HTTP Time-Stamping Internals, Time-Stamping @section FTP Time-Stamping Internals @cindex ftp time-stamping @@ -2541,7 +2584,7 @@ that is supported by some @sc{ftp} servers (including the popular @code{wu-ftpd}), which returns the exact time of the specified file. Wget may support this command in the future. -@node Startup File +@node Startup File, Examples, Time-Stamping, Top @chapter Startup File @cindex startup file @cindex wgetrc @@ -2569,7 +2612,7 @@ commands. * Sample Wgetrc:: A wgetrc example. @end menu -@node Wgetrc Location +@node Wgetrc Location, Wgetrc Syntax, Startup File, Startup File @section Wgetrc Location @cindex wgetrc location @cindex location of wgetrc @@ -2590,7 +2633,7 @@ means that in case of collision user's wgetrc @emph{overrides} the system-wide wgetrc (in @file{/usr/local/etc/wgetrc} by default). Fascist admins, away! -@node Wgetrc Syntax +@node Wgetrc Syntax, Wgetrc Commands, Wgetrc Location, Startup File @section Wgetrc Syntax @cindex wgetrc syntax @cindex syntax of wgetrc @@ -2617,7 +2660,7 @@ global @file{wgetrc}, you can do it with: reject = @end example -@node Wgetrc Commands +@node Wgetrc Commands, Sample Wgetrc, Wgetrc Syntax, Startup File @section Wgetrc Commands @cindex wgetrc commands @@ -2641,6 +2684,16 @@ Same as @samp{-A}/@samp{-R} (@pxref{Types of Files}). @item add_hostdir = on/off Enable/disable host-prefixed file names. @samp{-nH} disables it. +@item ask_password = on/off +Prompt for a password for each connection established. Cannot be specified +when @samp{--password} is being used, because they are mutually +exclusive. Equivalent to @samp{--ask-password}. + +@item auth_no_challenge = on/off +If this option is given, Wget will send Basic HTTP authentication +information (plaintext username and password) for all requests. See +@samp{--auth-no-challenge}. + @item background = on/off Enable/disable going to background---the same as @samp{-b} (which enables it). @@ -2653,9 +2706,10 @@ Enable/disable saving pre-converted files with the suffix @c #### Document me! @c @item base = @var{string} -Consider relative @sc{url}s in @sc{url} input files forced to be -interpreted as @sc{html} as being relative to @var{string}---the same as -@samp{--base=@var{string}}. +Consider relative @sc{url}s in input files (specified via the +@samp{input} command or the @samp{--input-file}/@samp{-i} option, +together with @samp{force_html} or @samp{--force-html}) +as being relative to @var{string}---the same as @samp{--base=@var{string}}. @item bind_address = @var{address} Bind to @var{address}, like the @samp{--bind-address=@var{address}}. @@ -2710,6 +2764,9 @@ Ignore @var{n} remote directory components. Equivalent to @item debug = on/off Debug mode, same as @samp{-d}. +@item default_page = @var{string} +Default page name---the same as @samp{--default-page=@var{string}}. + @item delete_after = on/off Delete after download---the same as @samp{--delete-after}. @@ -2794,10 +2851,12 @@ Turn globbing on/off---the same as @samp{--glob} and @samp{--no-glob}. Define a header for HTTP downloads, like using @samp{--header=@var{string}}. -@item html_extension = on/off +@item adjust_extension = on/off Add a @samp{.html} extension to @samp{text/html} or -@samp{application/xhtml+xml} files without it, or a @samp{.css} -extension to @samp{text/css} files without it, like @samp{-E}. +@samp{application/xhtml+xml} files that lack one, or a @samp{.css} +extension to @samp{text/css} files that lack one, like +@samp{-E}. Previously named @samp{html_extension} (still acceptable, +but deprecated). @item http_keep_alive = on/off Turn the keep-alive feature on or off (defaults to on). Turning it @@ -2835,6 +2894,10 @@ Ignore certain @sc{html} tags when doing a recursive retrieval, like Specify a comma-separated list of directories you wish to follow when downloading---the same as @samp{-I @var{string}}. +@item iri = on/off +When set to on, enable internationalized URI (IRI) support; the same as +@samp{--iri}. + @item inet4_only = on/off Force connecting to IPv4 addresses, off by default. You can put this in the global init file to disable Wget's attempts to resolve and @@ -2849,6 +2912,10 @@ or @samp{-6}. @item input = @var{file} Read the @sc{url}s from @var{string}, like @samp{-i @var{file}}. +@item keep_session_cookies = on/off +When specified, causes @samp{save_cookies = on} to also save session +cookies. See @samp{--keep-session-cookies}. + @item limit_rate = @var{rate} Limit the download speed to no more than @var{rate} bytes per second. The same as @samp{--limit-rate=@var{rate}}. @@ -2856,6 +2923,10 @@ The same as @samp{--limit-rate=@var{rate}}. @item load_cookies = @var{file} Load cookies from @var{file}. See @samp{--load-cookies @var{file}}. +@item local_encoding = @var{encoding} +Force Wget to use @var{encoding} as the default system encoding. See +@samp{--local-encoding}. + @item logfile = @var{file} Set logfile to @var{file}, the same as @samp{-o @var{file}}. @@ -2975,6 +3046,10 @@ the @sc{http} spec who got the spelling of ``referrer'' wrong.) Follow only relative links---the same as @samp{-L} (@pxref{Relative Links}). +@item remote_encoding = @var{encoding} +Force Wget to use @var{encoding} as the default remote server encoding. +See @samp{--remote-encoding}. + @item remove_listing = on/off If set to on, remove @sc{ftp} listings downloaded by Wget. Setting it to off is the same as @samp{--no-remove-listing}. @@ -3002,6 +3077,9 @@ this off. Save cookies to @var{file}. The same as @samp{--save-cookies @var{file}}. +@item save_headers = on/off +Same as @samp{--save-headers}. + @item secure_protocol = @var{string} Choose the secure protocol to be used. Legal values are @samp{auto} (the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same @@ -3014,6 +3092,9 @@ responses---the same as @samp{-S}. @item span_hosts = on/off Same as @samp{-H}. +@item spider = on/off +Same as @samp{--spider}. + @item strict_comments = on/off Same as @samp{--strict-comments}. @@ -3037,6 +3118,10 @@ Specify username @var{string} for both @sc{ftp} and @sc{http} file retrieval. This command can be overridden using the @samp{ftp_user} and @samp{http_user} command for @sc{ftp} and @sc{http} respectively. +@item user_agent = @var{string} +User agent identification sent to the HTTP Server---the same as +@samp{--user-agent=@var{string}}. + @item verbose = on/off Turn verbose on/off---the same as @samp{-v}/@samp{-nv}. @@ -3050,7 +3135,7 @@ only---the same as @samp{--waitretry=@var{n}}. Note that this is turned on by default in the global @file{wgetrc}. @end table -@node Sample Wgetrc +@node Sample Wgetrc, , Wgetrc Commands, Startup File @section Sample Wgetrc @cindex sample wgetrc @@ -3067,7 +3152,7 @@ its line. @include sample.wgetrc.munged_for_texi_inclusion @end example -@node Examples +@node Examples, Various, Startup File, Top @chapter Examples @cindex examples @@ -3081,7 +3166,7 @@ complexity. * Very Advanced Usage:: The hairy stuff. @end menu -@node Simple Usage +@node Simple Usage, Advanced Usage, Examples, Examples @section Simple Usage @itemize @bullet @@ -3134,7 +3219,7 @@ links index.html @end example @end itemize -@node Advanced Usage +@node Advanced Usage, Very Advanced Usage, Simple Usage, Examples @section Advanced Usage @itemize @bullet @@ -3270,7 +3355,7 @@ wget -O - http://cool.list.com/ | wget --force-html -i - @end example @end itemize -@node Very Advanced Usage +@node Very Advanced Usage, , Advanced Usage, Examples @section Very Advanced Usage @cindex mirroring @@ -3319,7 +3404,7 @@ wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog @end itemize @c man end -@node Various +@node Various, Appendices, Examples, Top @chapter Various @cindex various @@ -3329,14 +3414,14 @@ This chapter contains all the stuff that could not fit anywhere else. * Proxies:: Support for proxy servers. * Distribution:: Getting the latest version. * Web Site:: GNU Wget's presence on the World Wide Web. -* Mailing List:: Wget mailing list for announcements and discussion. +* Mailing Lists:: Wget mailing list for announcements and discussion. * Internet Relay Chat:: Wget's presence on IRC. * Reporting Bugs:: How and where to report bugs. * Portability:: The systems Wget works on. * Signals:: Signal-handling performed by Wget. @end menu -@node Proxies +@node Proxies, Distribution, Various, Various @section Proxies @cindex proxies @@ -3412,7 +3497,7 @@ Alternatively, you may use the @samp{proxy-user} and settings @code{proxy_user} and @code{proxy_password} to set the proxy username and password. -@node Distribution +@node Distribution, Web Site, Proxies, Various @section Distribution @cindex latest version @@ -3421,7 +3506,7 @@ master GNU archive site ftp.gnu.org, and its mirrors. For example, Wget @value{VERSION} can be found at @url{ftp://ftp.gnu.org/pub/gnu/wget/wget-@value{VERSION}.tar.gz} -@node Web Site +@node Web Site, Mailing Lists, Distribution, Various @section Web Site @cindex web site @@ -3430,43 +3515,64 @@ The official web site for GNU Wget is at information resides at ``The Wget Wgiki'', @url{http://wget.addictivecode.org/}. -@node Mailing List -@section Mailing List +@node Mailing Lists, Internet Relay Chat, Web Site, Various +@section Mailing Lists @cindex mailing list @cindex list -There are several Wget-related mailing lists. The general discussion -list is at @email{wget@@sunsite.dk}. It is the preferred place for -support requests and suggestions, as well as for discussion of -development. You are invited to subscribe. - -To subscribe, simply send mail to @email{wget-subscribe@@sunsite.dk} -and follow the instructions. Unsubscribe by mailing to -@email{wget-unsubscribe@@sunsite.dk}. The mailing list is archived at +@unnumberedsubsec Primary List + +The primary mailinglist for discussion, bug-reports, or questions +about GNU Wget is at @email{bug-wget@@gnu.org}. To subscribe, send an +email to @email{bug-wget-join@@gnu.org}, or visit +@url{http://lists.gnu.org/mailman/listinfo/bug-wget}. + +You do not need to subscribe to send a message to the list; however, +please note that unsubscribed messages are moderated, and may take a +while before they hit the list---@strong{usually around a day}. If +you want your message to show up immediately, please subscribe to the +list before posting. Archives for the list may be found at +@url{http://lists.gnu.org/pipermail/bug-wget/}. + +An NNTP/Usenettish gateway is also available via +@uref{http://gmane.org/about.php,Gmane}. You can see the Gmane +archives at +@url{http://news.gmane.org/gmane.comp.web.wget.general}. Note that the +Gmane archives conveniently include messages from both the current +list, and the previous one. Messages also show up in the Gmane +archives sooner than they do at @url{lists.gnu.org}. + +@unnumberedsubsec Bug Notices List + +Additionally, there is the @email{wget-notify@@addictivecode.org} mailing +list. This is a non-discussion list that receives bug report +notifications from the bug-tracker. To subscribe to this list, +send an email to @email{wget-notify-join@@addictivecode.org}, +or visit @url{http://addictivecode.org/mailman/listinfo/wget-notify}. + +@unnumberedsubsec Obsolete Lists + +Previously, the mailing list @email{wget@@sunsite.dk} was used as the +main discussion list, and another list, +@email{wget-patches@@sunsite.dk} was used for submitting and +discussing patches to GNU Wget. + +Messages from @email{wget@@sunsite.dk} are archived at +@itemize @tie{} +@item @url{http://www.mail-archive.com/wget%40sunsite.dk/} and at -@url{http://news.gmane.org/gmane.comp.web.wget.general}. - -Another mailing list is at @email{wget-patches@@sunsite.dk}, and is -used to submit patches for review by Wget developers. A ``patch'' is -a textual representation of change to source code, readable by both -humans and programs. The -@url{http://wget.addictivecode.org/PatchGuidelines} page -covers the creation and submitting of patches in detail. Please don't -send general suggestions or bug reports to @samp{wget-patches}; use it -only for patch submissions. - -Subscription is the same as above for @email{wget@@sunsite.dk}, except -that you send to @email{wget-patches-subscribe@@sunsite.dk}, instead. -The mailing list is archived at -@url{http://news.gmane.org/gmane.comp.web.wget.patches}. +@item +@url{http://news.gmane.org/gmane.comp.web.wget.general} (which also +continues to archive the current list, @email{bug-wget@@gnu.org}). +@end itemize -Finally, there is the @email{wget-notify@@addictivecode.org} mailing -list. This is a non-discussion list that receives bug report-change -notifications from the bug-tracker. Unlike for the other mailing lists, -subscription is through the @code{mailman} interface at -@url{http://addictivecode.org/mailman/listinfo/wget-notify}. +Messages from @email{wget-patches@@sunsite.dk} are archived at +@itemize @tie{} +@item +@url{http://news.gmane.org/gmane.comp.web.wget.patches}. +@end itemize -@node Internet Relay Chat +@node Internet Relay Chat, Reporting Bugs, Mailing Lists, Various @section Internet Relay Chat @cindex Internet Relay Chat @cindex IRC @@ -3475,7 +3581,7 @@ subscription is through the @code{mailman} interface at In addition to the mailinglists, we also have a support channel set up via IRC at @code{irc.freenode.org}, @code{#wget}. Come check it out! -@node Reporting Bugs +@node Reporting Bugs, Portability, Internet Relay Chat, Various @section Reporting Bugs @cindex bugs @cindex reporting bugs @@ -3495,7 +3601,7 @@ Wget crashes, it's a bug. If Wget does not behave as documented, it's a bug. If things work strange, but you are not sure about the way they are supposed to work, it might well be a bug, but you might want to double-check the documentation and the mailing lists (@pxref{Mailing -List}). +Lists}). @item Try to repeat the bug in as simple circumstances as possible. E.g. if @@ -3534,7 +3640,7 @@ safe to try. @end enumerate @c man end -@node Portability +@node Portability, Signals, Reporting Bugs, Various @section Portability @cindex portability @cindex operating systems @@ -3567,7 +3673,7 @@ Support for building on MS-DOS via DJGPP has been contributed by Gisle Vanem; a port to VMS is maintained by Steven Schweda, and is available at @url{http://antinode.org/}. -@node Signals +@node Signals, , Portability, Various @section Signals @cindex signal handling @cindex hangup @@ -3588,7 +3694,7 @@ SIGHUP received, redirecting output to `wget-log'. Other than that, Wget will not try to interfere with signals in any way. @kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it alike. -@node Appendices +@node Appendices, Copying this manual, Various, Top @chapter Appendices This chapter contains some references I consider useful. @@ -3599,7 +3705,7 @@ This chapter contains some references I consider useful. * Contributors:: People who helped. @end menu -@node Robot Exclusion +@node Robot Exclusion, Security Considerations, Appendices, Appendices @section Robot Exclusion @cindex robot exclusion @cindex robots.txt @@ -3638,7 +3744,7 @@ avoid. To be found by the robots, the specifications must be placed in download and parse. Although Wget is not a web robot in the strictest sense of the word, it -can downloads large parts of the site without the user's intervention to +can download large parts of the site without the user's intervention to download an individual page. Because of that, Wget honors RES when downloading recursively. For instance, when you issue: @@ -3682,7 +3788,7 @@ robot exclusion, set the @code{robots} variable to @samp{off} in your @file{.wgetrc}. You can achieve the same effect from the command line using the @code{-e} switch, e.g. @samp{wget -e robots=off @var{url}...}. -@node Security Considerations +@node Security Considerations, Contributors, Robot Exclusion, Appendices @section Security Considerations @cindex security @@ -3713,7 +3819,7 @@ being careful when you send debug logs (yes, even when you send them to me). @end enumerate -@node Contributors +@node Contributors, , Security Considerations, Appendices @section Contributors @cindex contributors @@ -3829,6 +3935,9 @@ Gnulib getpasswd-gnu module. @item Ted Mielczarek---donated support for CSS. +@item +Saint Xavier---Support for IRIs (RFC 3987). + @item People who provided donations for development---including Brian Gough. @end itemize @@ -3940,6 +4049,7 @@ Fila Kolodny, Alexander Kourakos, Martin Kraemer, Sami Krank, +Jay Krell, @tex $\Sigma\acute{\iota}\mu o\varsigma\; \Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$ @@ -3970,6 +4080,7 @@ Aurelien Marchand, Matthew J.@: Mellon, Jordan Mendelson, Ted Mielczarek, +Robert Millan, Lin Zhe Min, Jan Minar, Tim Mooney, @@ -4045,6 +4156,8 @@ Charles G Waldman, Douglas E.@: Wegscheid, Ralf Wildenhues, Joshua David Williams, +Benjamin Wolsey, +Saint Xavier, YAMAZAKI Makoto, Jasmin Zainul, @iftex @@ -4053,22 +4166,27 @@ Bojan @v{Z}drnja, @ifnottex Bojan Zdrnja, @end ifnottex -Kristijan Zimmer. +Kristijan Zimmer, +Xin Zou. Apologies to all who I accidentally left out, and many thanks to all the subscribers of the Wget mailing list. -@node Copying this manual +@node Copying this manual, Concept Index, Appendices, Top @appendix Copying this manual @menu * GNU Free Documentation License:: Licnse for copying this manual. @end menu +@node GNU Free Documentation License, , Copying this manual, Copying this manual +@appendixsec GNU Free Documentation License +@cindex FDL, GNU Free Documentation License + @include fdl.texi -@node Concept Index +@node Concept Index, , Copying this manual, Top @unnumbered Concept Index @printindex cp