X-Git-Url: http://sjero.net/git/?p=wget;a=blobdiff_plain;f=doc%2Fwget.texi;h=ee7a873cee0b493b19d536516a3e189c99efdf92;hp=05f19d95d0e47d5171bc36dba719c177f8569bad;hb=9dadbf6fe9577a6a6b7e7bab4e4b782fc1a6f86c;hpb=94d6650817110c639975d45df57b345e10b0e396 diff --git a/doc/wget.texi b/doc/wget.texi index 05f19d95..ee7a873c 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -486,9 +486,8 @@ specified as @var{file}, @sc{url}s are read from the standard input. If this function is used, no @sc{url}s need be present on the command line. If there are @sc{url}s both on the command line and in an input file, those on the command lines will be the first ones to be -retrieved. The @var{file} need not be an @sc{html} document (but no -harm if it is)---it is enough if the @sc{url}s are just listed -sequentially. +retrieved. If @samp{--force-html} is not specified, then @var{file} +should consist of a series of URLs, one per line. However, if you specify @samp{--force-html}, the document will be regarded as @samp{html}. In that case you may have problems with @@ -513,8 +512,17 @@ option. @cindex base for relative links in input file @item -B @var{URL} @itemx --base=@var{URL} -Prepends @var{URL} to relative links read from the file specified with -the @samp{-i} option. +Resolves relative links using @var{URL} as the point of reference, +when reading links from an HTML file specified via the +@samp{-i}/@samp{--input-file} option (together with +@samp{--force-html}, or when the input file was fetched remotely from +a server describing it as @sc{html}). This is equivalent to the +presence of a @code{BASE} tag in the @sc{html} input file, with +@var{URL} as the value for the @code{href} attribute. + +For instance, if you specify @samp{http://foo/bar/a.html} for +@var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it +would be resolved to @samp{http://foo/baz/b.html}. @end table @node Download Options, Directory Options, Logging and Input File Options, Invoking @@ -896,24 +904,36 @@ won't need it. @cindex file names, restrict @cindex Windows file names -@item --restrict-file-names=@var{mode} -Change which characters found in remote URLs may show up in local file -names generated from those URLs. Characters that are @dfn{restricted} +@item --restrict-file-names=@var{modes} +Change which characters found in remote URLs must be escaped during +generation of local filenames. Characters that are @dfn{restricted} by this option are escaped, i.e. replaced with @samp{%HH}, where @samp{HH} is the hexadecimal number that corresponds to the restricted -character. - -By default, Wget escapes the characters that are not valid as part of -file names on your operating system, as well as control characters that -are typically unprintable. This option is useful for changing these -defaults, either because you are downloading to a non-native partition, -or because you want to disable escaping of the control characters. - -When mode is set to ``unix'', Wget escapes the character @samp{/} and +character. This option may also be used to force all alphabetical +cases to be either lower- or uppercase. + +By default, Wget escapes the characters that are not valid or safe as +part of file names on your operating system, as well as control +characters that are typically unprintable. This option is useful for +changing these defaults, perhaps because you are downloading to a +non-native partition, or because you want to disable escaping of the +control characters, or you want to further restrict characters to only +those in the @sc{ascii} range of values. + +The @var{modes} are a comma-separated set of text values. The +acceptable values are @samp{unix}, @samp{windows}, @samp{nocontrol}, +@samp{ascii}, @samp{lowercase}, and @samp{uppercase}. The values +@samp{unix} and @samp{windows} are mutually exclusive (one will +override the other), as are @samp{lowercase} and +@samp{uppercase}. Those last are special cases, as they do not change +the set of characters that would be escaped, but rather force local +file paths to be converted either to lower- or uppercase. + +When ``unix'' is specified, Wget escapes the character @samp{/} and the control characters in the ranges 0--31 and 128--159. This is the -default on Unix-like OS'es. +default on Unix-like operating systems. -When mode is set to ``windows'', Wget escapes the characters @samp{\}, +When ``windows'' is given, Wget escapes the characters @samp{\}, @samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<}, @samp{>}, and the control characters in the ranges 0--31 and 128--159. In addition to this, Wget in Windows mode uses @samp{+} instead of @@ -924,11 +944,17 @@ name from the rest. Therefore, a URL that would be saved as saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows mode. This mode is the default on Windows. -If you append @samp{,nocontrol} to the mode, as in -@samp{unix,nocontrol}, escaping of the control characters is also -switched off. You can use @samp{--restrict-file-names=nocontrol} to -turn off escaping of control characters without affecting the choice of -the OS to use as file name restriction mode. +If you specify @samp{nocontrol}, then the escaping of the control +characters is also switched off. This option may make sense +when you are downloading URLs whose names contain UTF-8 characters, on +a system which can save and display filenames in UTF-8 (some possible +byte values used in UTF-8 byte sequences fall in the range of values +designated by Wget as ``controls''). + +The @samp{ascii} mode is used to specify that any bytes whose values +are outside the range of @sc{ascii} characters (that is, greater than +127) shall be escaped. This can be useful when saving filenames +whose encoding does not match the one used locally. @cindex IPv6 @itemx -4 @@ -997,6 +1023,46 @@ options for @sc{http} connections. @item --ask-password Prompt for a password for each connection established. Cannot be specified when @samp{--password} is being used, because they are mutually exclusive. + +@cindex iri support +@cindex idn support +@item --no-iri + +Turn off internationalized URI (IRI) support. Use @samp{--iri} to +turn it on. IRI support is activated by default. + +You can set the default state of IRI support using the @code{iri} +command in @file{.wgetrc}. That setting may be overridden from the +command line. + +@cindex local encoding +@item --local-encoding=@var{encoding} + +Force Wget to use @var{encoding} as the default system encoding. That affects +how Wget converts URLs specified as arguments from locale to @sc{utf-8} for +IRI support. + +Wget use the function @code{nl_langinfo()} and then the @code{CHARSET} +environment variable to get the locale. If it fails, @sc{ascii} is used. + +You can set the default local encoding using the @code{local_encoding} +command in @file{.wgetrc}. That setting may be overridden from the +command line. + +@cindex remote encoding +@item --remote-encoding=@var{encoding} + +Force Wget to use @var{encoding} as the default remote server encoding. +That affects how Wget converts URIs found in files from remote encoding +to @sc{utf-8} during a recursive fetch. This options is only useful for +IRI support, for the interpretation of non-@sc{ascii} characters. + +For HTTP, remote encoding can be found in HTTP @code{Content-Type} +header and in HTML @code{Content-Type http-equiv} meta tag. + +You can set the default encoding using the @code{remoteencoding} +command in @file{.wgetrc}. That setting may be overridden from the +command line. @end table @node Directory Options, HTTP Options, Download Options, Invoking @@ -1082,8 +1148,9 @@ Use @var{name} as the default file name when it isn't known (i.e., for URLs that end in a slash), instead of @file{index.html}. @cindex .html extension +@cindex .css extension @item -E -@itemx --html-extension +@itemx --adjust-extension If a file of type @samp{application/xhtml+xml} or @samp{text/html} is downloaded and the URL does not end with the regexp @samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html} @@ -1104,9 +1171,14 @@ version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive Retrieval Options}). As of version 1.12, Wget will also ensure that any downloaded files of -type @samp{text/css} end in the suffix @samp{.css}. Obviously, this -makes the name @samp{--html-extension} misleading; a better name is -expected to be offered as an alternative in the near future. +type @samp{text/css} end in the suffix @samp{.css}, and the option was +renamed from @samp{--html-extension}, to better reflect its new +behavior. The old option name is still acceptable, but should now be +considered deprecated. + +At some point in the future, this option may well be expanded to +include suffixes for other types of content, including content types +that are not parsed by Wget. @cindex http user @cindex http password @@ -1347,10 +1419,20 @@ not to send the @code{User-Agent} header in @sc{http} requests. @cindex POST @item --post-data=@var{string} @itemx --post-file=@var{file} -Use POST as the method for all HTTP requests and send the specified data -in the request body. @code{--post-data} sends @var{string} as data, -whereas @code{--post-file} sends the contents of @var{file}. Other than -that, they work in exactly the same way. +Use POST as the method for all HTTP requests and send the specified +data in the request body. @samp{--post-data} sends @var{string} as +data, whereas @samp{--post-file} sends the contents of @var{file}. +Other than that, they work in exactly the same way. In particular, +they @emph{both} expect content of the form @code{key1=value1&key2=value2}, +with percent-encoding for special characters; the only difference is +that one expects its content as a command-line paramter and the other +accepts its content from a file. In particular, @samp{--post-file} is +@emph{not} for transmitting files as form attachments: those must +appear as @code{key=value} data (with appropriate percent-coding) just +like everything else. Wget does not currently support +@code{multipart/form-data} for transmitting POST data; only +@code{application/x-www-form-urlencoded}. Only one of +@samp{--post-data} and @samp{--post-file} should be specified. Please be aware that Wget needs to know the size of the POST data in advance. Therefore the argument to @code{--post-file} must be a regular @@ -2188,7 +2270,7 @@ ways, all of which can change whether an accept/reject rule matches: If the local file already exists and @samp{--no-directories} was specified, a numeric suffix will be appended to the original name. @item -If @samp{--html-extension} was specified, the local filename will have +If @samp{--adjust-extension} was specified, the local filename might have @samp{.html} appended to it. If Wget is invoked with @samp{-E -A.php}, a filename such as @samp{index.php} will match be accepted, but upon download will be named @samp{index.php.html}, which no longer matches, @@ -2602,6 +2684,16 @@ Same as @samp{-A}/@samp{-R} (@pxref{Types of Files}). @item add_hostdir = on/off Enable/disable host-prefixed file names. @samp{-nH} disables it. +@item ask_password = on/off +Prompt for a password for each connection established. Cannot be specified +when @samp{--password} is being used, because they are mutually +exclusive. Equivalent to @samp{--ask-password}. + +@item auth_no_challenge = on/off +If this option is given, Wget will send Basic HTTP authentication +information (plaintext username and password) for all requests. See +@samp{--auth-no-challenge}. + @item background = on/off Enable/disable going to background---the same as @samp{-b} (which enables it). @@ -2614,9 +2706,10 @@ Enable/disable saving pre-converted files with the suffix @c #### Document me! @c @item base = @var{string} -Consider relative @sc{url}s in @sc{url} input files forced to be -interpreted as @sc{html} as being relative to @var{string}---the same as -@samp{--base=@var{string}}. +Consider relative @sc{url}s in input files (specified via the +@samp{input} command or the @samp{--input-file}/@samp{-i} option, +together with @samp{force_html} or @samp{--force-html}) +as being relative to @var{string}---the same as @samp{--base=@var{string}}. @item bind_address = @var{address} Bind to @var{address}, like the @samp{--bind-address=@var{address}}. @@ -2758,10 +2851,12 @@ Turn globbing on/off---the same as @samp{--glob} and @samp{--no-glob}. Define a header for HTTP downloads, like using @samp{--header=@var{string}}. -@item html_extension = on/off +@item adjust_extension = on/off Add a @samp{.html} extension to @samp{text/html} or -@samp{application/xhtml+xml} files without it, or a @samp{.css} -extension to @samp{text/css} files without it, like @samp{-E}. +@samp{application/xhtml+xml} files that lack one, or a @samp{.css} +extension to @samp{text/css} files that lack one, like +@samp{-E}. Previously named @samp{html_extension} (still acceptable, +but deprecated). @item http_keep_alive = on/off Turn the keep-alive feature on or off (defaults to on). Turning it @@ -2799,6 +2894,10 @@ Ignore certain @sc{html} tags when doing a recursive retrieval, like Specify a comma-separated list of directories you wish to follow when downloading---the same as @samp{-I @var{string}}. +@item iri = on/off +When set to on, enable internationalized URI (IRI) support; the same as +@samp{--iri}. + @item inet4_only = on/off Force connecting to IPv4 addresses, off by default. You can put this in the global init file to disable Wget's attempts to resolve and @@ -2813,6 +2912,10 @@ or @samp{-6}. @item input = @var{file} Read the @sc{url}s from @var{string}, like @samp{-i @var{file}}. +@item keep_session_cookies = on/off +When specified, causes @samp{save_cookies = on} to also save session +cookies. See @samp{--keep-session-cookies}. + @item limit_rate = @var{rate} Limit the download speed to no more than @var{rate} bytes per second. The same as @samp{--limit-rate=@var{rate}}. @@ -2820,6 +2923,10 @@ The same as @samp{--limit-rate=@var{rate}}. @item load_cookies = @var{file} Load cookies from @var{file}. See @samp{--load-cookies @var{file}}. +@item local_encoding = @var{encoding} +Force Wget to use @var{encoding} as the default system encoding. See +@samp{--local-encoding}. + @item logfile = @var{file} Set logfile to @var{file}, the same as @samp{-o @var{file}}. @@ -2939,6 +3046,10 @@ the @sc{http} spec who got the spelling of ``referrer'' wrong.) Follow only relative links---the same as @samp{-L} (@pxref{Relative Links}). +@item remote_encoding = @var{encoding} +Force Wget to use @var{encoding} as the default remote server encoding. +See @samp{--remote-encoding}. + @item remove_listing = on/off If set to on, remove @sc{ftp} listings downloaded by Wget. Setting it to off is the same as @samp{--no-remove-listing}. @@ -3824,6 +3935,9 @@ Gnulib getpasswd-gnu module. @item Ted Mielczarek---donated support for CSS. +@item +Saint Xavier---Support for IRIs (RFC 3987). + @item People who provided donations for development---including Brian Gough. @end itemize @@ -3935,6 +4049,7 @@ Fila Kolodny, Alexander Kourakos, Martin Kraemer, Sami Krank, +Jay Krell, @tex $\Sigma\acute{\iota}\mu o\varsigma\; \Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$ @@ -4042,6 +4157,7 @@ Douglas E.@: Wegscheid, Ralf Wildenhues, Joshua David Williams, Benjamin Wolsey, +Saint Xavier, YAMAZAKI Makoto, Jasmin Zainul, @iftex