From c33a1f97fe7ccefc61838c63f8754aab9fa03fde Mon Sep 17 00:00:00 2001 From: dan Date: Mon, 26 Mar 2001 19:22:17 -0800 Subject: [PATCH] [svn] TODO: -p should probably go "_two_ more hops" on pages. wget.texi (Recursive Retrieval Options): Explained that you need to use -r -l1 -p to get the two levels of requisites for a page. Also made a few other wording improvements. --- ChangeLog | 4 ++++ TODO | 2 ++ doc/ChangeLog | 6 ++++++ doc/wget.texi | 17 ++++++++++++++--- 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index 087a52a4..52e8418c 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2001-03-26 Dan Harkless + + * TODO: -p should probably go "_two_ more hops" on pages. + 2001-03-22 Dan Harkless * MACHINES: Added rs6000-ibm-aix4.3.3.0. diff --git a/TODO b/TODO index 8b919e43..faf5fb26 100644 --- a/TODO +++ b/TODO @@ -7,6 +7,8 @@ items are not listed in any particular order (except that recently-added items may tend towards the top). Not all of these represent user-visible changes. +* -p should probably go "_two_ more hops" on pages. + * Only normal link-following recursion should respect -np. Page-requisite recursion should not. When -np -p is specified, Wget should still retrieve requisite images and such on the server, even if they aren't in that directory diff --git a/doc/ChangeLog b/doc/ChangeLog index 1716da6b..b0a747c1 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,9 @@ +2001-03-26 Dan Harkless + + * wget.texi (Recursive Retrieval Options): Explained that you need + to use -r -l1 -p to get the two levels of requisites for a + page. Also made a few other wording improvements. + 2001-03-17 Dan Harkless * Makefile.in: Using '^' in the sed call caused a weird failure on diff --git a/doc/wget.texi b/doc/wget.texi index a1fa76db..11c501fd 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -1065,7 +1065,7 @@ requisites. For instance, say document @file{1.html} contains an @code{} tag referencing @file{1.gif} and an @code{} tag pointing to external -document @file{2.html}. Say that @file{2.html} is the same but that its +document @file{2.html}. Say that @file{2.html} is similar but that its image is @file{2.gif} and it links to @file{3.html}. Say this continues up to some arbitrarily high number. @@ -1103,8 +1103,8 @@ would download just @file{1.html} and @file{1.gif}, but unfortunately this is not the case, because @samp{-l 0} is equivalent to @samp{-l inf}---that is, infinite recursion. To download a single HTML page (or a handful of them, all specified on the commandline or in a -@samp{-i} @sc{url} input file) and its requisites, simply leave off -@samp{-p} and @samp{-l}: +@samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off +@samp{-r} and @samp{-l}: @example wget -p http://@var{site}/1.html @@ -1121,6 +1121,17 @@ likes to use a few options in addition to @samp{-p}: wget -E -H -k -K -nh -p http://@var{site}/@var{document} @end example +In one case you'll need to add a couple more options. If @var{document} +is a @code{} page, the "one more hop" that @samp{-p} gives you +won't be enough---you'll get the @code{} pages that are +referenced, but you won't get @emph{their} requisites. Therefore, in +this case you'll need to add @samp{-r -l1} to the commandline. The +@samp{-r -l1} will recurse from the @code{} page to to the +@code{} pages, and the @samp{-p} will get their requisites. If +you're already using a recursion level of 1 or more, you'll need to up +it by one. In the future, @samp{-p} may be made smarter so that it'll +do "two more hops" in the case of a @code{} page. + To finish off this topic, it's worth knowing that Wget's idea of an external document link is any URL specified in an @code{} tag, an @code{} tag, or a @code{} tag other than @code{