@end iftex
@c This should really be auto-generated!
-@set VERSION 1.7-dev
-@set UPDATED Jan 2001
+@set VERSION 1.7-pre1
+@set UPDATED May 2001
@dircategory Net Utilities
@dircategory World Wide Web
This manual documents version @value{VERSION} of GNU Wget, the freely
available utility for network download.
-Copyright @copyright{} 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
+Copyright @copyright{} 1996, 1997, 1998, 2000, 2001 Free Software
+Foundation, Inc.
@menu
* Overview:: Features of Wget.
Without @samp{-c}, the previous example would just download the remote
file to @file{ls-lR.Z.1}, leaving the truncated @file{ls-lR.Z} file
-alone.
+alone.
-If you use @samp{-c} on a file which is now smaller on the server than
-locally (presumably because it was changed on the server since your last
-download attempt), the file will be re-downloaded from scratch.
-Unfortunately this also happens if the local file is the same length as
-the server file---this will be fixed in a future version of Wget, but in
-the meantime you can use @samp{--timestamping} to prevent this on files
-for which the server gives timestamps (e.g. static files but not CGI
-output or @sc{http} directory listings).
+Beginning with Wget 1.7, if you use @samp{-c} on a non-empty file, and
+it turns out that the server does not support continued downloading,
+Wget will refuse to start the download from scratch, which would
+effectively ruin existing contents. If you really want the download to
+start from scratch, remove the file.
+
+Also beginning with Wget 1.7, if you use @samp{-c} on a file which is of
+equal size as the one on the server, Wget will refuse to download the
+file and print an explanatory message. The same happens when the file
+is smaller on the server than locally (presumably because it was changed
+on the server since your last download attempt)---because ``continuing''
+is not meaningful, no download occurs.
On the other side of the coin, while using @samp{-c}, any file that's
bigger on the server than locally will be considered an incomplete
-download and only @code{(length(server) - length(local))} bytes will
-be downloaded and tacked onto the end of the local file. This behavior
-can be desirable in certain cases---for instance, you can use @samp{wget
--c} to download just the new portion that's been appended to a data
+download and only @code{(length(remote) - length(local))} bytes will be
+downloaded and tacked onto the end of the local file. This behavior can
+be desirable in certain cases---for instance, you can use @samp{wget -c}
+to download just the new portion that's been appended to a data
collection or log file.
However, if the file is bigger on the server because it's been
Caching is allowed by default.
+@cindex cookies
+@item --cookies=on/off
+When set to off, disable the use of cookies. Cookies are a mechanism
+for maintaining server-side state. The server sends the client a cookie
+using the @code{Set-Cookie} header, and the client responds with the
+same cookie upon further requests. Since cookies allow the server
+owners to keep track of visitors and for sites to exchange this
+information, some consider them a breach of privacy. The default is to
+use cookies; however, @emph{storing} cookies is not on by default.
+
+@cindex loading cookies
+@cindex cookies, loading
+@item --load-cookies @var{file}
+Load cookies from @var{file} before the first HTTP retrieval. The
+format of @var{file} is one used by Netscape and Mozilla, at least their
+Unix version.
+
+@cindex saving cookies
+@cindex cookies, saving
+@item --save-cookies @var{file}
+Save cookies from @var{file} at the end of session. Cookies whose
+expiry time is not specified, or those that have already expired, are
+not saved.
+
@cindex Content-Length, ignore
@cindex ignore length
@item --ignore-length
@section FTP Options
@table @samp
-@cindex symbolic links, retrieving
-@item --retr-symlinks
-Usually, when retrieving @sc{ftp} directories recursively and a symbolic
-link is encountered, the linked-to file is not downloaded. Instead, a
-matching symbolic link is created on the local filesystem. The
-pointed-to file will not be downloaded unless this recursive retrieval
-would have encountered it separately and downloaded it anyway.
-
-When @samp{--retr-symlinks} is specified, however, symbolic links are
-traversed and the pointed-to files are retrieved. At this time, this
-option does not cause Wget to traverse symlinks to directories and
-recurse through them, but in the future it should be enhanced to do
-this.
-
-Note that when retrieving a file (not a directory) because it was
-specified on the commandline, rather than because it was recursed to,
-this option has no effect. Symbolic links are always traversed in this
-case.
+@cindex .listing files, removing
+@item -nr
+@itemx --dont-remove-listing
+Don't remove the temporary @file{.listing} files generated by @sc{ftp}
+retrievals. Normally, these files contain the raw directory listings
+received from @sc{ftp} servers. Not removing them can be useful for
+debugging purposes, or when you want to be able to easily check on the
+contents of remote server directories (e.g. to verify that a mirror
+you're running is complete).
+
+Note that even though Wget writes to a known filename for this file,
+this is not a security hole in the scenario of a user making
+@file{.listing} a symbolic link to @file{/etc/passwd} or something and
+asking @code{root} to run Wget in his or her directory. Depending on
+the options used, either Wget will refuse to write to @file{.listing},
+making the globbing/recursion/time-stamping operation fail, or the
+symbolic link will be deleted and replaced with the actual
+@file{.listing} file, or the listing will be written to a
+@file{.listing.@var{number}} file.
+
+Even though this situation isn't a problem, though, @code{root} should
+never run Wget in a non-trusted user's directory. A user could do
+something as simple as linking @file{index.html} to @file{/etc/passwd}
+and asking @code{root} to run Wget with @samp{-N} or @samp{-r} so the file
+will be overwritten.
@cindex globbing, toggle
@item -g on/off
Use the @dfn{passive} @sc{ftp} retrieval scheme, in which the client
initiates the data connection. This is sometimes required for @sc{ftp}
to work behind firewalls.
+
+@cindex symbolic links, retrieving
+@item --retr-symlinks
+Usually, when retrieving @sc{ftp} directories recursively and a symbolic
+link is encountered, the linked-to file is not downloaded. Instead, a
+matching symbolic link is created on the local filesystem. The
+pointed-to file will not be downloaded unless this recursive retrieval
+would have encountered it separately and downloaded it anyway.
+
+When @samp{--retr-symlinks} is specified, however, symbolic links are
+traversed and the pointed-to files are retrieved. At this time, this
+option does not cause Wget to traverse symlinks to directories and
+recurse through them, but in the future it should be enhanced to do
+this.
+
+Note that when retrieving a file (not a directory) because it was
+specified on the commandline, rather than because it was recursed to,
+this option has no effect. Symbolic links are always traversed in this
+case.
@end table
@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking
@cindex link conversion
@item -k
@itemx --convert-links
-Convert the non-relative links to relative ones locally. Only the
-references to the documents actually downloaded will be converted; the
-rest will be left unchanged.
+After the download is complete, convert the links in the document to
+make them suitable for local viewing. This affects not only the visible
+hyperlinks, but any part of the document that links to external content,
+such as embedded images, links to style sheets, hyperlinks to non-HTML
+content, etc.
+
+Each link will be changed in one of the two ways:
+
+@itemize @bullet
+@item
+The links to files that have been downloaded by Wget will be changed to
+refer to the file they point to as a relative link.
+
+Example: if the downloaded file @file{/foo/doc.html} links to
+@file{/bar/img.gif}, also downloaded, then the link in @file{doc.html}
+will be modified to point to @samp{../bar/img.gif}. This kind of
+transformation works reliably for arbitrary combinations of directories.
+
+@item
+The links to files that have not been downloaded by Wget will be changed
+to include host name and absolute path of the location they point to.
+
+Example: if the downloaded file @file{/foo/doc.html} links to
+@file{/bar/img.gif} (or to @file{../bar/img.gif}), then the link in
+@file{doc.html} will be modified to point to
+@file{http://@var{hostname}/bar/img.gif}.
+@end itemize
+
+Because of this, local browsing works reliably: if a linked file was
+downloaded, the link will refer to its local name; if it was not
+downloaded, the link will refer to its full Internet address rather than
+presenting a broken link. The fact that the former links are converted
+to relative links ensures that you can move the downloaded hierarchy to
+another directory.
Note that only at the end of the download can Wget know which links have
-been downloaded. Because of that, much of the work done by @samp{-k}
-will be performed at the end of the downloads.
+been downloaded. Because of that, the work done by @samp{-k} will be
+performed at the end of all the downloads.
@cindex backing up converted files
@item -K
directory listings. It is currently equivalent to
@samp{-r -N -l inf -nr}.
-@item -nr
-@itemx --dont-remove-listing
-Don't remove the temporary @file{.listing} files generated by @sc{ftp}
-retrievals. Normally, these files contain the raw directory listings
-received from @sc{ftp} servers. Not removing them can be useful to
-access the full remote file list when running a mirror, or for debugging
-purposes.
-
@cindex page requisites
@cindex required images, downloading
@item -p
For instance, say document @file{1.html} contains an @code{<IMG>} tag
referencing @file{1.gif} and an @code{<A>} tag pointing to external
-document @file{2.html}. Say that @file{2.html} is the same but that its
+document @file{2.html}. Say that @file{2.html} is similar but that its
image is @file{2.gif} and it links to @file{3.html}. Say this
continues up to some arbitrarily high number.
this is not the case, because @samp{-l 0} is equivalent to
@samp{-l inf}---that is, infinite recursion. To download a single HTML
page (or a handful of them, all specified on the commandline or in a
-@samp{-i} @sc{url} input file) and its requisites, simply leave off
-@samp{-p} and @samp{-l}:
+@samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off
+@samp{-r} and @samp{-l}:
@example
wget -p http://@var{site}/1.html
wget -E -H -k -K -nh -p http://@var{site}/@var{document}
@end example
+In one case you'll need to add a couple more options. If @var{document}
+is a @code{<FRAMESET>} page, the "one more hop" that @samp{-p} gives you
+won't be enough---you'll get the @code{<FRAME>} pages that are
+referenced, but you won't get @emph{their} requisites. Therefore, in
+this case you'll need to add @samp{-r -l1} to the commandline. The
+@samp{-r -l1} will recurse from the @code{<FRAMESET>} page to to the
+@code{<FRAME>} pages, and the @samp{-p} will get their requisites. If
+you're already using a recursion level of 1 or more, you'll need to up
+it by one. In the future, @samp{-p} may be made smarter so that it'll
+do "two more hops" in the case of a @code{<FRAMESET>} page.
+
To finish off this topic, it's worth knowing that Wget's idea of an
external document link is any URL specified in an @code{<A>} tag, an
@code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK
@cindex ftp time-stamping
In theory, @sc{ftp} time-stamping works much the same as @sc{http}, only
-@sc{ftp} has no headers---time-stamps must be received from the
-directory listings.
+@sc{ftp} has no headers---time-stamps must be ferreted out of directory
+listings.
If an @sc{ftp} download is recursive or uses globbing, Wget will use the
@sc{ftp} @code{LIST} command to get a file listing for the directory
Enable/disable host-prefixed file names. @samp{-nH} disables it.
@item continue = on/off
-Enable/disable continuation of the retrieval---the same as @samp{-c}
-(which enables it).
+If set to on, force continuation of preexistent partially retrieved
+files. See @samp{-c} before setting it.
@item background = on/off
Enable/disable going to background---the same as @samp{-b} (which
@item convert links = on/off
Convert non-relative links locally. The same as @samp{-k}.
+@item cookies = on/off
+When set to off, disallow cookies. See the @samp{--cookies} option.
+
+@item load_cookies = @var{file}
+Load cookies from @var{file}. See @samp{--load-cookies}.
+
+@item save_cookies = @var{file}
+Save cookies to @var{file}. See @samp{--save-cookies}.
+
@item cut_dirs = @var{n}
Ignore @var{n} remote directory components.