X-Git-Url: http://sjero.net/git/?a=blobdiff_plain;f=doc%2Fwget.texi;h=400debed96b814ed7da202c3570ab07402e61152;hb=798f554773baf1adca376500ca120a992e6d7492;hp=7af58050c76f2b66ce04caefb9b7606dc3690483;hpb=cb30bc9a94d6aee4cfe8b2caa958a6b0ae4ae6d2;p=wget diff --git a/doc/wget.texi b/doc/wget.texi index 7af58050..400debed 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -20,9 +20,9 @@ @set Wget Wget @c man title Wget The non-interactive network downloader. -@dircategory Network Applications +@dircategory Network applications @direntry -* Wget: (wget). The non-interactive network downloader. +* Wget: (wget). Non-interactive network downloader. @end direntry @copying @@ -479,6 +479,9 @@ Turn off verbose without being completely quiet (use @samp{-q} for that), which means that error messages and basic information still get printed. +@item --report-speed=@var{type} +Output bandwidth as @var{type}. The only accepted value is @samp{bits}. + @cindex input-file @item -i @var{file} @itemx --input-file=@var{file} @@ -1445,7 +1448,7 @@ data, whereas @samp{--post-file} sends the contents of @var{file}. Other than that, they work in exactly the same way. In particular, they @emph{both} expect content of the form @code{key1=value1&key2=value2}, with percent-encoding for special characters; the only difference is -that one expects its content as a command-line paramter and the other +that one expects its content as a command-line parameter and the other accepts its content from a file. In particular, @samp{--post-file} is @emph{not} for transmitting files as form attachments: those must appear as @code{key=value} data (with appropriate percent-coding) just @@ -1506,6 +1509,12 @@ This option is useful for some file-downloading CGI programs that use @code{Content-Disposition} headers to describe what the name of a downloaded file should be. +@cindex Content On Error +@item --content-on-error + +If this is set to on, wget will not skip the content when the server responds +with a http status code that indicates error. + @cindex Trust server names @item --trust-server-names @@ -1652,6 +1661,36 @@ not used), EGD is never contacted. EGD is not needed on modern Unix systems that support @file{/dev/random}. @end table +@cindex WARC +@table @samp +@item --warc-file=@var{file} +Use @var{file} as the destination WARC file. + +@item --warc-header=@var{string} +Use @var{string} into as the warcinfo record. + +@item --warc-max-size=@var{size} +Set the maximum size of the WARC files to @var{size}. + +@item --warc-cdx +Write CDX index files. + +@item --warc-dedup=@var{file} +Do not store records listed in this CDX file. + +@item --no-warc-compression +Do not compress WARC files with GZIP. + +@item --no-warc-digests +Do not calculate SHA1 digests. + +@item --no-warc-keep-log +Do not store the log file in a WARC record. + +@item --warc-tempdir=@var{dir} +Specify the location for temporary files created by the WARC writer. +@end table + @node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking @section FTP Options @@ -1764,12 +1803,12 @@ case. @item -r @itemx --recursive Turn on recursive retrieving. @xref{Recursive Download}, for more -details. +details. The default maximum depth is 5. @item -l @var{depth} @itemx --level=@var{depth} Specify recursion maximum depth level @var{depth} (@pxref{Recursive -Download}). The default maximum depth is 5. +Download}). @cindex proxy filling @cindex delete after retrieval @@ -2278,6 +2317,8 @@ in @file{.wgetrc}. @item -A @var{acclist} @itemx --accept @var{acclist} @itemx accept = @var{acclist} +@itemx --accept-regex @var{urlregex} +@itemx accept-regex = @var{urlregex} The argument to @samp{--accept} option is a list of file suffixes or patterns that Wget will download during recursive retrieval. A suffix is the ending part of a file, and consists of ``normal'' letters, @@ -2294,6 +2335,9 @@ a description of how pattern matching works. Of course, any number of suffixes and patterns can be combined into a comma-separated list, and given as an argument to @samp{-A}. +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + @cindex reject wildcards @cindex reject suffixes @cindex wildcards, reject @@ -2301,6 +2345,8 @@ comma-separated list, and given as an argument to @samp{-A}. @item -R @var{rejlist} @itemx --reject @var{rejlist} @itemx reject = @var{rejlist} +@itemx --reject-regex @var{urlregex} +@itemx reject-regex = @var{urlregex} The @samp{--reject} option works the same way as @samp{--accept}, only its logic is the reverse; Wget will download all files @emph{except} the ones matching the suffixes (or patterns) in the list. @@ -2312,6 +2358,9 @@ Analogously, to download all files except the ones beginning with expansion by the shell. @end table +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + @noindent The @samp{-A} and @samp{-R} options may be combined to achieve even better fine-tuning of which files to retrieve. E.g. @samp{wget -A @@ -3171,6 +3220,10 @@ as @samp{--secure-protocol=@var{string}}. Choose whether or not to print the @sc{http} and @sc{ftp} server responses---the same as @samp{-S}. +@item show_all_dns_entries = on/off +When a DNS name is resolved, show all the IP addresses, not just the first +three. + @item span_hosts = on/off Same as @samp{-H}. @@ -3522,28 +3575,30 @@ internal networks from the rest of Internet. In order to obtain information from the Web, their users connect and retrieve remote data using an authorized proxy. +@c man begin ENVIRONMENT Wget supports proxies for both @sc{http} and @sc{ftp} retrievals. The standard way to specify proxy location, which Wget recognizes, is using the following environment variables: -@table @code +@table @env @item http_proxy @itemx https_proxy -If set, the @code{http_proxy} and @code{https_proxy} variables should +If set, the @env{http_proxy} and @env{https_proxy} variables should contain the @sc{url}s of the proxies for @sc{http} and @sc{https} connections respectively. @item ftp_proxy This variable should contain the @sc{url} of the proxy for @sc{ftp} -connections. It is quite common that @code{http_proxy} and -@code{ftp_proxy} are set to the same @sc{url}. +connections. It is quite common that @env{http_proxy} and +@env{ftp_proxy} are set to the same @sc{url}. @item no_proxy This variable should contain a comma-separated list of domain extensions proxy should @emph{not} be used for. For instance, if the value of -@code{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve +@env{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve documents from MIT. @end table +@c man end In addition to the environment variables, proxy location and settings may be specified from within Wget itself.