X-Git-Url: http://sjero.net/git/?p=wget;a=blobdiff_plain;f=doc%2Fwget.texi;h=cced7edda118e268b6f290ae272380179044dbce;hp=af211e354d96a85fea93d31bc0c51fec1925af96;hb=42c78fdd71c311cf96210b709ec0a18ef45ef87f;hpb=1489612dd1e604b8ccf4b4d7326ebcc93ca56923 diff --git a/doc/wget.texi b/doc/wget.texi index af211e35..cced7edd 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -20,9 +20,9 @@ @set Wget Wget @c man title Wget The non-interactive network downloader. -@dircategory Network Applications +@dircategory Network applications @direntry -* Wget: (wget). The non-interactive network downloader. +* Wget: (wget). Non-interactive network downloader. @end direntry @copying @@ -31,7 +31,8 @@ data. @c man begin COPYRIGHT Copyright @copyright{} 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, -2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc. +2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, +Inc. @iftex Permission is granted to make and distribute verbatim copies of @@ -63,7 +64,6 @@ Documentation License''. @ignore @c man begin AUTHOR Originally written by Hrvoje Niksic . -Currently maintained by Micah Cowan . @c man end @c man begin SEEALSO This is @strong{not} the complete manual for GNU Wget. @@ -190,7 +190,9 @@ gauge can be customized to your preferences. Most of the features are fully configurable, either through command line options, or via the initialization file @file{.wgetrc} (@pxref{Startup File}). Wget allows you to define @dfn{global} startup files -(@file{/usr/local/etc/wgetrc} by default) for site settings. +(@file{/usr/local/etc/wgetrc} by default) for site settings. You can also +specify the location of a startup file with the --config option. + @ignore @c man begin FILES @@ -477,6 +479,9 @@ Turn off verbose without being completely quiet (use @samp{-q} for that), which means that error messages and basic information still get printed. +@item --report-speed=@var{type} +Output bandwidth as @var{type}. The only accepted value is @samp{bits}. + @cindex input-file @item -i @var{file} @itemx --input-file=@var{file} @@ -524,6 +529,10 @@ presence of a @code{BASE} tag in the @sc{html} input file, with For instance, if you specify @samp{http://foo/bar/a.html} for @var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it would be resolved to @samp{http://foo/baz/b.html}. + +@cindex specify config +@item --config=@var{FILE} +Specify the location of a startup file you wish to use. @end table @node Download Options, Directory Options, Logging and Input File Options, Invoking @@ -541,10 +550,10 @@ IPs. @cindex retries @cindex tries -@cindex number of retries +@cindex number of tries @item -t @var{number} @itemx --tries=@var{number} -Set number of retries to @var{number}. Specify 0 or @samp{inf} for +Set number of tries to @var{number}. Specify 0 or @samp{inf} for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like ``connection refused'' or ``not found'' (404), which are not retried. @@ -579,7 +588,8 @@ some cases where this behavior can actually have some use. Note that a combination with @samp{-k} is only permitted when downloading a single document, as in that case it will just convert all relative URIs to external ones; @samp{-k} makes no sense for -multiple URIs when they're all being downloaded to a single file. +multiple URIs when they're all being downloaded to a single file; +@samp{-k} can be used only when the output is a regular file. @cindex clobbering, file @cindex downloading multiple times @@ -620,6 +630,13 @@ Note that when @samp{-nc} is specified, files with the suffixes @samp{.html} or @samp{.htm} will be loaded from the local disk and parsed as if they had been retrieved from the Web. +@cindex backing up files +@item --backups=@var{backups} +Before (over)writing a file, back up an existing file by adding a +@samp{.1} suffix (@samp{_1} on VMS) to the file name. Such backup +files are rotated to @samp{.2}, @samp{.3}, and so on, up to +@var{backups} (and lost beyond that). + @cindex continue retrieval @cindex incomplete downloads @cindex resume download @@ -705,9 +722,12 @@ different meaning to one dot. With the @code{default} style each dot represents 1K, there are ten dots in a cluster and 50 dots in a line. The @code{binary} style has a more ``computer''-like orientation---8K dots, 16-dots clusters and 48 dots per line (which makes for 384K -lines). The @code{mega} style is suitable for downloading very large +lines). The @code{mega} style is suitable for downloading large files---each dot represents 64K retrieved, there are eight dots in a cluster, and 48 dots on each line (so each line contains 3M). +If @code{mega} is not enough then you can use the @code{giga} +style---each dot represents 1M retrieved, there are eight dots in a +cluster, and 32 dots on each line (so each line contains 32M). Note that you can set the default style using the @code{progress} command in @file{.wgetrc}. That setting may be overridden from the @@ -719,6 +739,16 @@ use @samp{--progress=bar:force}. @itemx --timestamping Turn on time-stamping. @xref{Time-Stamping}, for details. +@item --no-use-server-timestamps +Don't set the local file's timestamp by the one on the server. + +By default, when a file is downloaded, its timestamps are set to +match those from the remote file. This allows the use of +@samp{--timestamping} on subsequent invocations of wget. However, it +is sometimes useful to base the local file's timestamp on when it was +actually downloaded; for that purpose, the +@samp{--no-use-server-timestamps} option has been provided. + @cindex server response, print @item -S @itemx --server-response @@ -830,9 +860,7 @@ If you don't want Wget to wait between @emph{every} retrieval, but only between retries of failed downloads, you can use this option. Wget will use @dfn{linear backoff}, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that -file, up to the maximum number of @var{seconds} you specify. Therefore, -a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55 -seconds per file. +file, up to the maximum number of @var{seconds} you specify. By default, Wget will assume a value of 10 seconds. @@ -857,7 +885,7 @@ recommendation to block many unrelated users from a web site due to the actions of one. @cindex proxy -@itemx --no-proxy +@item --no-proxy Don't use proxies, even if the appropriate @code{*_proxy} environment variable is defined. @@ -958,7 +986,7 @@ are outside the range of @sc{ascii} characters (that is, greater than whose encoding does not match the one used locally. @cindex IPv6 -@itemx -4 +@item -4 @itemx --inet4-only @itemx -6 @itemx --inet6-only @@ -1064,6 +1092,13 @@ header and in HTML @code{Content-Type http-equiv} meta tag. You can set the default encoding using the @code{remoteencoding} command in @file{.wgetrc}. That setting may be overridden from the command line. + +@cindex unlink +@item --unlink + +Force Wget to unlink file instead of clobbering existing file. This +option is useful for downloading to the directory with hardlinks. + @end table @node Directory Options, HTTP Options, Download Options, Invoking @@ -1166,10 +1201,7 @@ Note that filenames changed in this way will be re-downloaded every time you re-mirror a site, because Wget can't tell that the local @file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (since it doesn't yet know that the URL produces output of type -@samp{text/html} or @samp{application/xhtml+xml}. To prevent this -re-downloading, you must use @samp{-k} and @samp{-K} so that the original -version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive -Retrieval Options}). +@samp{text/html} or @samp{application/xhtml+xml}. As of version 1.12, Wget will also ensure that any downloaded files of type @samp{text/css} end in the suffix @samp{.css}, and the option was @@ -1426,7 +1458,7 @@ data, whereas @samp{--post-file} sends the contents of @var{file}. Other than that, they work in exactly the same way. In particular, they @emph{both} expect content of the form @code{key1=value1&key2=value2}, with percent-encoding for special characters; the only difference is -that one expects its content as a command-line paramter and the other +that one expects its content as a command-line parameter and the other accepts its content from a file. In particular, @samp{--post-file} is @emph{not} for transmitting files as form attachments: those must appear as @code{key=value} data (with appropriate percent-coding) just @@ -1435,6 +1467,11 @@ like everything else. Wget does not currently support @code{application/x-www-form-urlencoded}. Only one of @samp{--post-data} and @samp{--post-file} should be specified. +Please note that wget does not require the content to be of the form +@code{key1=value1&key2=value2}, and neither does it test for it. Wget will +simply transmit whatever data is provided to it. Most servers however expect +the POST data to be in the above format when processing HTML Forms. + Please be aware that Wget needs to know the size of the POST data in advance. Therefore the argument to @code{--post-file} must be a regular file; specifying a FIFO or something like @file{/dev/stdin} won't work. @@ -1445,14 +1482,15 @@ use chunked unless it knows it's talking to an HTTP/1.1 server. And it can't know that until it receives a response, which in turn requires the request to have been completed -- a chicken-and-egg problem. -Note: if Wget is redirected after the POST request is completed, it -will not send the POST data to the redirected URL. This is because -URLs that process POST often respond with a redirection to a regular -page, which does not desire or accept POST. It is not completely -clear that this behavior is optimal; if it doesn't work out, it might -be changed in the future. +Note: As of version 1.15 if Wget is redirected after the POST request is +completed, its behaviour will depend on the response code returned by the +server. In case of a 301 Moved Permanently, 302 Moved Temporarily or +307 Temporary Redirect, Wget will, in accordance with RFC2616, continue +to send a POST request. +In case a server wants the client to change the Request method upon +redirection, it should send a 303 See Other response code. -This example shows how to log to a server using POST and then proceed to +This example shows how to log in to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users: @@ -1475,6 +1513,37 @@ them (and neither will browsers) and the @file{cookies.txt} file will be empty. In that case use @samp{--keep-session-cookies} along with @samp{--save-cookies} to force saving of session cookies. +@cindex Other HTTP Methods +@item --method=@var{HTTP-Method} +For the purpose of RESTful scripting, Wget allows sending of other HTTP Methods +without the need to explicitly set them using @samp{--header=Header-Line}. +Wget will use whatever string is passed to it after @samp{--method} as the HTTP +Method to the server. + +@item --body-data=@var{Data-String} +@itemx --body-file=@var{Data-File} +Must be set when additional data needs to be sent to the server along with the +Method specified using @samp{--method}. @samp{--body-data} sends @var{string} as +data, whereas @samp{--body-file} sends the contents of @var{file}. Other than that, +they work in exactly the same way. + +Currently, @samp{--body-file} is @emph{not} for transmitting files as a whole. +Wget does not currently support @code{multipart/form-data} for transmitting data; +only @code{application/x-www-form-urlencoded}. In the future, this may be changed +so that wget sends the @samp{--body-file} as a complete file instead of sending its +contents to the server. Please be aware that Wget needs to know the contents of +BODY Data in advance, and hence the argument to @samp{--body-file} should be a +regular file. See @samp{--post-file} for a more detailed explanation. +Only one of @samp{--body-data} and @samp{--body-file} should be specified. + +If Wget is redirected after the request is completed, Wget will +suspend the current method and send a GET request till the redirection +is completed. This is true for all redirection response codes except +307 Temporary Redirect which is used to explicitly specify that the +request method should @emph{not} change. Another exception is when +the method is set to @code{POST}, in which case the redirection rules +specified under @samp{--post-data} are followed. + @cindex Content-Disposition @item --content-disposition @@ -1487,6 +1556,19 @@ This option is useful for some file-downloading CGI programs that use @code{Content-Disposition} headers to describe what the name of a downloaded file should be. +@cindex Content On Error +@item --content-on-error + +If this is set to on, wget will not skip the content when the server responds +with a http status code that indicates error. + +@cindex Trust server names +@item --trust-server-names + +If this is set to on, on a redirect the last component of the +redirection URL will be used as the local file name. By default it is +used the last component in the original URL. + @cindex authentication @item --auth-no-challenge @@ -1524,6 +1606,9 @@ buggy SSL server implementations that make it hard for OpenSSL to choose the correct protocol version. Fortunately, such servers are quite rare. +@item --https-only +When in recursive mode, only HTTPS links are followed. + @cindex SSL certificate, check @item --no-check-certificate Don't check the server certificate against the available certificate @@ -1626,6 +1711,36 @@ not used), EGD is never contacted. EGD is not needed on modern Unix systems that support @file{/dev/random}. @end table +@cindex WARC +@table @samp +@item --warc-file=@var{file} +Use @var{file} as the destination WARC file. + +@item --warc-header=@var{string} +Use @var{string} into as the warcinfo record. + +@item --warc-max-size=@var{size} +Set the maximum size of the WARC files to @var{size}. + +@item --warc-cdx +Write CDX index files. + +@item --warc-dedup=@var{file} +Do not store records listed in this CDX file. + +@item --no-warc-compression +Do not compress WARC files with GZIP. + +@item --no-warc-digests +Do not calculate SHA1 digests. + +@item --no-warc-keep-log +Do not store the log file in a WARC record. + +@item --warc-tempdir=@var{dir} +Specify the location for temporary files created by the WARC writer. +@end table + @node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking @section FTP Options @@ -1711,6 +1826,10 @@ in some rare firewall configurations, active FTP actually works when passive FTP doesn't. If you suspect this to be the case, use this option, or set @code{passive_ftp=off} in your init file. +@cindex file permissions +@item --preserve-permissions +Preserve remote file permissions instead of permissions set by umask. + @cindex symbolic links, retrieving @item --retr-symlinks Usually, when retrieving @sc{ftp} directories recursively and a symbolic @@ -1738,12 +1857,12 @@ case. @item -r @itemx --recursive Turn on recursive retrieving. @xref{Recursive Download}, for more -details. +details. The default maximum depth is 5. @item -l @var{depth} @itemx --level=@var{depth} Specify recursion maximum depth level @var{depth} (@pxref{Recursive -Download}). The default maximum depth is 5. +Download}). @cindex proxy filling @cindex delete after retrieval @@ -1948,13 +2067,22 @@ any of the wildcard characters, @samp{*}, @samp{?}, @samp{[} or @samp{]}, appear in an element of @var{acclist} or @var{rejlist}, it will be treated as a pattern, rather than a suffix. +@item --accept-regex @var{urlregex} +@itemx --reject-regex @var{urlregex} +Specify a regular expression to accept or reject the complete URL. + +@item --regex-type @var{regextype} +Specify the regular expression type. Possible types are @samp{posix} or +@samp{pcre}. Note that to be able to use @samp{pcre} type, wget has to be +compiled with libpcre support. + @item -D @var{domain-list} @itemx --domains=@var{domain-list} Set domains to be followed. @var{domain-list} is a comma-separated list of domains. Note that it does @emph{not} turn on @samp{-H}. @item --exclude-domains @var{domain-list} -Specify the domains that are @emph{not} to be followed. +Specify the domains that are @emph{not} to be followed (@pxref{Spanning Hosts}). @cindex follow FTP links @@ -2252,6 +2380,8 @@ in @file{.wgetrc}. @item -A @var{acclist} @itemx --accept @var{acclist} @itemx accept = @var{acclist} +@itemx --accept-regex @var{urlregex} +@itemx accept-regex = @var{urlregex} The argument to @samp{--accept} option is a list of file suffixes or patterns that Wget will download during recursive retrieval. A suffix is the ending part of a file, and consists of ``normal'' letters, @@ -2268,6 +2398,9 @@ a description of how pattern matching works. Of course, any number of suffixes and patterns can be combined into a comma-separated list, and given as an argument to @samp{-A}. +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + @cindex reject wildcards @cindex reject suffixes @cindex wildcards, reject @@ -2275,6 +2408,8 @@ comma-separated list, and given as an argument to @samp{-A}. @item -R @var{rejlist} @itemx --reject @var{rejlist} @itemx reject = @var{rejlist} +@itemx --reject-regex @var{urlregex} +@itemx reject-regex = @var{urlregex} The @samp{--reject} option works the same way as @samp{--accept}, only its logic is the reverse; Wget will download all files @emph{except} the ones matching the suffixes (or patterns) in the list. @@ -2286,6 +2421,9 @@ Analogously, to download all files except the ones beginning with expansion by the shell. @end table +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + @noindent The @samp{-A} and @samp{-R} options may be combined to achieve even better fine-tuning of which files to retrieve. E.g. @samp{wget -A @@ -2754,9 +2892,11 @@ enables it). Enable/disable saving pre-converted files with the suffix @samp{.orig}---the same as @samp{-K} (which enables it). -@c @item backups = @var{number} -@c #### Document me! -@c +@item backups = @var{number} +Use up to @var{number} backups for a file. Backups are rotated by +adding an incremental counter that starts at @samp{1}. The default is +@samp{0}. + @item base = @var{string} Consider relative @sc{url}s in input files (specified via the @samp{input} command or the @samp{--input-file}/@samp{-i} option, @@ -2799,6 +2939,10 @@ Set the connect timeout---the same as @samp{--connect-timeout}. Turn on recognition of the (non-standard) @samp{Content-Disposition} HTTP header---if set to @samp{on}, the same as @samp{--content-disposition}. +@item trust_server_names = on/off +If set to on, use the last component of a redirection URL for the local +file name. + @item continue = on/off If set to on, force continuation of preexistent partially retrieved files. See @samp{-c} before setting it. @@ -3014,7 +3158,7 @@ display properly---the same as @samp{-p}. Change setting of passive @sc{ftp}, equivalent to the @samp{--passive-ftp} option. -@itemx password = @var{string} +@item password = @var{string} Specify password @var{string} for both @sc{ftp} and @sc{http} file retrieval. This command can be overridden using the @samp{ftp_password} and @samp{http_password} command for @sc{ftp} and @sc{http} respectively. @@ -3141,6 +3285,10 @@ as @samp{--secure-protocol=@var{string}}. Choose whether or not to print the @sc{http} and @sc{ftp} server responses---the same as @samp{-S}. +@item show_all_dns_entries = on/off +When a DNS name is resolved, show all the IP addresses, not just the first +three. + @item span_hosts = on/off Same as @samp{-H}. @@ -3157,6 +3305,10 @@ Set all applicable timeout values to @var{n}, the same as @samp{-T @item timestamping = on/off Turn timestamping on/off. The same as @samp{-N} (@pxref{Time-Stamping}). +@item use_server_timestamps = on/off +If set to @samp{off}, Wget won't set the local file's timestamp by the +one on the server (same as @samp{--no-use-server-timestamps}). + @item tries = @var{n} Set number of retries per @sc{url}---the same as @samp{-t @var{n}}. @@ -3488,34 +3640,36 @@ internal networks from the rest of Internet. In order to obtain information from the Web, their users connect and retrieve remote data using an authorized proxy. +@c man begin ENVIRONMENT Wget supports proxies for both @sc{http} and @sc{ftp} retrievals. The standard way to specify proxy location, which Wget recognizes, is using the following environment variables: -@table @code +@table @env @item http_proxy @itemx https_proxy -If set, the @code{http_proxy} and @code{https_proxy} variables should +If set, the @env{http_proxy} and @env{https_proxy} variables should contain the @sc{url}s of the proxies for @sc{http} and @sc{https} connections respectively. @item ftp_proxy This variable should contain the @sc{url} of the proxy for @sc{ftp} -connections. It is quite common that @code{http_proxy} and -@code{ftp_proxy} are set to the same @sc{url}. +connections. It is quite common that @env{http_proxy} and +@env{ftp_proxy} are set to the same @sc{url}. @item no_proxy This variable should contain a comma-separated list of domain extensions proxy should @emph{not} be used for. For instance, if the value of -@code{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve +@env{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve documents from MIT. @end table +@c man end In addition to the environment variables, proxy location and settings may be specified from within Wget itself. @table @samp -@itemx --no-proxy +@item --no-proxy @itemx proxy = on/off This option and the corresponding command may be used to suppress the use of proxy, even if the appropriate environment variables are set. @@ -3879,9 +4033,8 @@ me). GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@xemacs.org}, @end iftex @ifnottex -GNU Wget was written by Hrvoje Niksic @email{hniksic@@xemacs.org}, +GNU Wget was written by Hrvoje Niksic @email{hniksic@@xemacs.org}. @end ifnottex -and it is currently maintained by Micah Cowan @email{micah@@cowan.name}. However, the development of Wget could never have gone as far as it has, were it not for the help of many people, either with bug reports, feature proposals,