+++ /dev/null
- -*- outline -*-
-
-This is the to-do list for GNU Wget. There is no timetable of when we
-plan to implement these features -- this is just a list of features
-we'd like to see in Wget, as well as a list of problems that need
-fixing. Patches to implement these items are likely to be accepted,
-especially if they follow the coding convention outlined in PATCHES
-and if they patch the documentation as well.
-
-The items are not listed in any particular order (except that
-recently-added items may tend towards the top). Not all of these
-represent user-visible changes.
-
-* Change the file name generation logic so that redirects can't dictate
- file names (but redirects should still be followed). By default, file
- names should be generated only from the URL the user provided. However,
- with an appropriate flag, Wget will allow the remote server to specify
- the file name, either through redirection (as is always the case now)
- or via the increasingly popular header `Content-Disposition: XXX;
- filename="FILE"'.
-
- The file name should be generated and displayed *after* processing
- the server's response, not before, as it is done now. This will
- allow trivial implementation of -nc, of O_EXCL when opening the
- file, --html-extension will stop being a horrible hack, and so on.
-
-* -O should be respected, with no exceptions. It should work in
- conjunction with -N and -k. (This is hard to achieve in the current
- code base.) Ancillary files, such as directory listings and such,
- should be downloaded either directly to memory, or to /tmp.
-
-* Implement digest and NTLM authorization for proxies. This is harder
- than it seems because it requires some rethinking of the HTTP code.
-
-* Rethink the interaction between recur.c (the recursive download code)
- and HTTP/FTP code. Ideally, the downloading code should have a way
- to retrieve a file and, optionally, to specify a list of URLs for
- continuing the "recursive" download. FTP code will surely benefit
- from such a restructuring because its current incarnation is way too
- smart for its own good.
-
-* Both HTTP and FTP connections should be first-class objects that can
- be reused after a download is done. Currently information about both
- is kept implicitly on the stack, and forgotten after each download.
-
-* Restructure the FTP code to remove massive amounts of code duplication
- and repetition. Remove all the "intelligence" and make it work as
- outlined in the previous bullet.
-
-* Add support for SFTP. Teach Wget about newer features of more
- recent FTP servers in general, such as receiving reliable checksums
- and timestamps. This can be used to implement really robust
- downloads.
-
-* Wget shouldn't delete rejected files that were not downloaded, but
- just found on disk because of `-nc'. For example, `wget -r -nc
- -A.gif URL' should allow the user to get all the GIFs without
- removing any of the existing HTML files.
-
-* Be careful not to lose username/password information given for the
- URL on the command line. For example,
- wget -r http://username:password@server/path/ should send that
- username and password to all content under /path/ (this is apparently
- what browsers do).
-
-* Don't send credentials using "Basic" authorization before the server
- has a chance to tell us that it supports Digest or NTLM!
-
-* Add a --range parameter allowing you to explicitly specify a range
- of bytes to get from a file over HTTP (FTP only supports ranges
- ending at the end of the file, though forcibly disconnecting from
- the server at the desired endpoint would work). For example,
- --range=n-m would specify inclusive range (a la the Range header),
- and --range=n:m would specify exclusive range (a la Python's
- slices). -c should work with --range by assuming the range is
- partially downloaded on disk, and contuing from there (effectively
- requesting a smaller range).
-
-* If multiple FTP URLs are specified that are on the same host, Wget should
- re-use the connection rather than opening a new one for each file.
- This should be easy provided the above restructuring of FTP code that
- would include the FTP connection becoming a first-class objects.
-
-* Try to devise a scheme so that, when password is unknown, Wget asks
- the user for one. This is harder than it seems because the password
- may be requested by some page encountered long after the user has
- left Wget to run.
-
-* If -c used with -N, check to make sure a file hasn't changed on the server
- before "continuing" to download it (preventing a bogus hybrid file).
-
-* Generalize --html-extension to something like --mime-extensions and
- have consult mime.types for the preferred extension. Non-HTML files
- with filenames changed this way would be re-downloaded each time
- despite -N unless .orig files were saved for them. (#### Why? The
- HEAD request we use to implement -N would still be able to construct
- the correct file name based on the declared Content-Type.)
-
- Since .orig would contain the same data as non-.orig, the latter
- could be just a link to the former. Another possibility would be to
- implement a per-directory database called something like
- .wget_url_mapping containing URLs and their corresponding filenames.
-
-* When spanning hosts, there's no way to say that you are only
- interested in files in a certain directory on _one_ of the hosts (-I
- and -X apply to all). Perhaps -I and -X should take an optional
- "hostname:" before the directory?
-
-* --retr-symlinks should cause wget to traverse links to directories too.
-
-* Make wget return non-zero status in more situations, like incorrect HTTP auth.
- Create and document different exit statuses for different errors.
-
-* Make -K compare X.orig to X and move the former on top of the latter if
- they're the same, rather than leaving identical .orig files laying around.
-
-* Make `-k' check for files that were downloaded in the past and convert links
- to them in newly-downloaded documents.
-
-* Devise a way for options to have effect on a per-URL basis. This is very
- natural for some options, such as --post-data. It could be implemented
- simply by having more than one struct options.
-
-* Add option to clobber existing file names (no `.N' suffixes).
-
-* Add option to only list wildcard matches without doing the download. The same
- could be generalized to support something like apt's --print-uri.
-
-* Handle MIME types correctly. There should be an option to (not)
- retrieve files based on MIME types, e.g. `--accept-types=image/*'.
- This would work for FTP by translating file extensions to MIME types
- using mime.types.
-
-* Allow time-stamping by arbitrary date. For example,
- wget --if-modified-after DATE URL.
-
-* Make quota apply to single files, preferrably so that the download of an
- oversized file is not attempted at all.
-
-* When updating an existing mirror, download to temporary files (such as .in*)
- and rename the file after the download is done.
-
-* Add an option to delete or move no-longer-existent files when mirroring.
-
-* Implement uploading (--upload=FILE URL?) in FTP and HTTP. A beginning of
- this is available in the form of --post-file, but it should be expanded to
- be really useful.
-
-* Make HTTP timestamping use If-Modified-Since facility.
-
-* Add more protocols (such as news or possibly some of the streaming
- protocols), implementing them in a modular fashion.
-
-* Add a "rollback" option to have continued retrieval throw away a
- configurable number of bytes at the end of a file before resuming
- download. Apparently, some stupid proxies insert a "transfer
- interrupted" string we need to get rid of.