X-Git-Url: http://sjero.net/git/?p=wget;a=blobdiff_plain;f=doc%2Fwget.texi;h=73341ec6c00842bdd9ec39477083d94acbf5a4e2;hp=627e0059461a9dd9fb10e3087cda83c2a8a0b6b9;hb=4661f141bb6e694592b1f26c49f11f7036093162;hpb=3e25a9817f47fbb8660cc6a3b2f3eea239526c6c diff --git a/doc/wget.texi b/doc/wget.texi index 627e0059..73341ec6 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -20,9 +20,9 @@ @set Wget Wget @c man title Wget The non-interactive network downloader. -@dircategory Network Applications +@dircategory Network applications @direntry -* Wget: (wget). The non-interactive network downloader. +* Wget: (wget). Non-interactive network downloader. @end direntry @copying @@ -31,7 +31,8 @@ data. @c man begin COPYRIGHT Copyright @copyright{} 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, -2004, 2005, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc. +2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, +Inc. @iftex Permission is granted to make and distribute verbatim copies of @@ -63,7 +64,6 @@ Documentation License''. @ignore @c man begin AUTHOR Originally written by Hrvoje Niksic . -Currently maintained by Micah Cowan . @c man end @c man begin SEEALSO This is @strong{not} the complete manual for GNU Wget. @@ -190,7 +190,9 @@ gauge can be customized to your preferences. Most of the features are fully configurable, either through command line options, or via the initialization file @file{.wgetrc} (@pxref{Startup File}). Wget allows you to define @dfn{global} startup files -(@file{/usr/local/etc/wgetrc} by default) for site settings. +(@file{/usr/local/etc/wgetrc} by default) for site settings. You can also +specify the location of a startup file with the --config option. + @ignore @c man begin FILES @@ -477,6 +479,10 @@ Turn off verbose without being completely quiet (use @samp{-q} for that), which means that error messages and basic information still get printed. +@item -nv +@itemx --report-speed=@var{type} +Output bandwidth as @var{type}. The only accepted value is @samp{bits}. + @cindex input-file @item -i @var{file} @itemx --input-file=@var{file} @@ -524,6 +530,10 @@ presence of a @code{BASE} tag in the @sc{html} input file, with For instance, if you specify @samp{http://foo/bar/a.html} for @var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it would be resolved to @samp{http://foo/baz/b.html}. + +@cindex specify config +@item --config=@var{FILE} +Specify the location of a startup file you wish to use. @end table @node Download Options, Directory Options, Logging and Input File Options, Invoking @@ -841,9 +851,7 @@ If you don't want Wget to wait between @emph{every} retrieval, but only between retries of failed downloads, you can use this option. Wget will use @dfn{linear backoff}, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that -file, up to the maximum number of @var{seconds} you specify. Therefore, -a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55 -seconds per file. +file, up to the maximum number of @var{seconds} you specify. By default, Wget will assume a value of 10 seconds. @@ -1075,6 +1083,13 @@ header and in HTML @code{Content-Type http-equiv} meta tag. You can set the default encoding using the @code{remoteencoding} command in @file{.wgetrc}. That setting may be overridden from the command line. + +@cindex unlink +@item --unlink + +Force Wget to unlink file instead of clobbering existing file. This +option is useful for downloading to the directory with hardlinks. + @end table @node Directory Options, HTTP Options, Download Options, Invoking @@ -1177,10 +1192,7 @@ Note that filenames changed in this way will be re-downloaded every time you re-mirror a site, because Wget can't tell that the local @file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (since it doesn't yet know that the URL produces output of type -@samp{text/html} or @samp{application/xhtml+xml}. To prevent this -re-downloading, you must use @samp{-k} and @samp{-K} so that the original -version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive -Retrieval Options}). +@samp{text/html} or @samp{application/xhtml+xml}. As of version 1.12, Wget will also ensure that any downloaded files of type @samp{text/css} end in the suffix @samp{.css}, and the option was @@ -1437,7 +1449,7 @@ data, whereas @samp{--post-file} sends the contents of @var{file}. Other than that, they work in exactly the same way. In particular, they @emph{both} expect content of the form @code{key1=value1&key2=value2}, with percent-encoding for special characters; the only difference is -that one expects its content as a command-line paramter and the other +that one expects its content as a command-line parameter and the other accepts its content from a file. In particular, @samp{--post-file} is @emph{not} for transmitting files as form attachments: those must appear as @code{key=value} data (with appropriate percent-coding) just @@ -1498,6 +1510,12 @@ This option is useful for some file-downloading CGI programs that use @code{Content-Disposition} headers to describe what the name of a downloaded file should be. +@cindex Content On Error +@item --content-on-error + +If this is set to on, wget will not skip the content when the server responds +with a http status code that indicates error. + @cindex Trust server names @item --trust-server-names @@ -1644,6 +1662,36 @@ not used), EGD is never contacted. EGD is not needed on modern Unix systems that support @file{/dev/random}. @end table +@cindex WARC +@table @samp +@item --warc-file=@var{file} +Use @var{file} as the destination WARC file. + +@item --warc-header=@var{string} +Use @var{string} into as the warcinfo record. + +@item --warc-max-size=@var{size} +Set the maximum size of the WARC files to @var{size}. + +@item --warc-cdx +Write CDX index files. + +@item --warc-dedup=@var{file} +Do not store records listed in this CDX file. + +@item --no-warc-compression +Do not compress WARC files with GZIP. + +@item --no-warc-digests +Do not calculate SHA1 digests. + +@item --no-warc-keep-log +Do not store the log file in a WARC record. + +@item --warc-tempdir=@var{dir} +Specify the location for temporary files created by the WARC writer. +@end table + @node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking @section FTP Options @@ -1756,12 +1804,12 @@ case. @item -r @itemx --recursive Turn on recursive retrieving. @xref{Recursive Download}, for more -details. +details. The default maximum depth is 5. @item -l @var{depth} @itemx --level=@var{depth} Specify recursion maximum depth level @var{depth} (@pxref{Recursive -Download}). The default maximum depth is 5. +Download}). @cindex proxy filling @cindex delete after retrieval @@ -1972,7 +2020,7 @@ Set domains to be followed. @var{domain-list} is a comma-separated list of domains. Note that it does @emph{not} turn on @samp{-H}. @item --exclude-domains @var{domain-list} -Specify the domains that are @emph{not} to be followed. +Specify the domains that are @emph{not} to be followed (@pxref{Spanning Hosts}). @cindex follow FTP links @@ -2270,6 +2318,8 @@ in @file{.wgetrc}. @item -A @var{acclist} @itemx --accept @var{acclist} @itemx accept = @var{acclist} +@itemx --accept-regex @var{urlregex} +@itemx accept-regex = @var{urlregex} The argument to @samp{--accept} option is a list of file suffixes or patterns that Wget will download during recursive retrieval. A suffix is the ending part of a file, and consists of ``normal'' letters, @@ -2286,6 +2336,9 @@ a description of how pattern matching works. Of course, any number of suffixes and patterns can be combined into a comma-separated list, and given as an argument to @samp{-A}. +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + @cindex reject wildcards @cindex reject suffixes @cindex wildcards, reject @@ -2293,6 +2346,8 @@ comma-separated list, and given as an argument to @samp{-A}. @item -R @var{rejlist} @itemx --reject @var{rejlist} @itemx reject = @var{rejlist} +@itemx --reject-regex @var{urlregex} +@itemx reject-regex = @var{urlregex} The @samp{--reject} option works the same way as @samp{--accept}, only its logic is the reverse; Wget will download all files @emph{except} the ones matching the suffixes (or patterns) in the list. @@ -2304,6 +2359,9 @@ Analogously, to download all files except the ones beginning with expansion by the shell. @end table +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + @noindent The @samp{-A} and @samp{-R} options may be combined to achieve even better fine-tuning of which files to retrieve. E.g. @samp{wget -A @@ -3163,6 +3221,10 @@ as @samp{--secure-protocol=@var{string}}. Choose whether or not to print the @sc{http} and @sc{ftp} server responses---the same as @samp{-S}. +@item show_all_dns_entries = on/off +When a DNS name is resolved, show all the IP addresses, not just the first +three. + @item span_hosts = on/off Same as @samp{-H}. @@ -3905,9 +3967,8 @@ me). GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@xemacs.org}, @end iftex @ifnottex -GNU Wget was written by Hrvoje Niksic @email{hniksic@@xemacs.org}, +GNU Wget was written by Hrvoje Niksic @email{hniksic@@xemacs.org}. @end ifnottex -and it is currently maintained by Micah Cowan @email{micah@@cowan.name}. However, the development of Wget could never have gone as far as it has, were it not for the help of many people, either with bug reports, feature proposals,