2000-11-01 Hrvoje Niksic * retr.c (retrieve_url): Detect redirection cycles. 2000-11-01 Hrvoje Niksic * url.c (get_urls_html): Decode HTML entities using html_decode_entities. * html.c (htmlfindurl): Don't count the `#' in numeric entities (&#NNN;) as an HTML fragemnt. (html_decode_entities): New function. 2000-11-01 Hrvoje Niksic * html.c (htmlfindurl): Fix recognition of # HTML fragments. 2000-11-01 Hrvoje Niksic * url.c (construct): Rewritten for clarity. Avoids the unnecessary copying and stack-allocation the old version performed. 2000-10-31 Hrvoje Niksic * ftp.c (getftp): Ditto. * http.c (gethttp): Rewind the stream when retrying from scratch. 2000-10-31 Hrvoje Niksic * retr.c (retrieve_url): Use url_concat() to handle relative redirections instead of /ad hoc/ code. * url.c (url_concat): New function encapsulating weird construct(). (urllen_http_hack): New function. (construct): When constructing new URLs, recognize that `?' does not form part of the file name in HTTP. 2000-10-13 Adrian Aichner * retr.c: Add msec timing support for WINDOWS. * retr.c (reset_timer): GetSystemTime() on WINDOWS. * retr.c (elapsed_time): Calculate delta time to msec on WINDOWS. 2000-10-27 Dan Harkless * retr.c (retrieve_url): Manually applied T. Bharath 's patch to get wget to grok illegal relative URL redirects. Reformatted and re-commented it. 2000-10-23 Dan Harkless * connect.c (make_connection and bindport): Manually applied Rob Mayoff 's 1.5.3 patch to add --bind-address, changing coding style to GNU's. * ftp.c (ftp_loop_internal): --delete-after wasn't implemented for files downloaded via FTP. Per a comment, .listing files were not counted towards number of bytes and files downloaded because they're deleted anyway. Well, they aren't under -nr, so count them then. * init.c: Manually applied Rob Mayoff's 1.5.3 patch to add --bind-address, alphabetizing, changing coding style to GNU's, commenting, and renaming cmd_ip_address() to cmd_address() to imply hostnames also okay. * main.c (main): --delete-after didn't delete the root of the tree. Ignore --convert-links if --delete-after was specified. Manually applied Rob Mayoff's 1.5.3 patch to add --bind-address, fixing duplicate use of added-since-1.5.3 case value. (print_help): Clarified that --delete-after deletes local files. Rob forgot to add a line for his new --bind-address option. * options.h (struct options): Manually applied Rob Mayoff's patch to add --bind-address (bind_address structure member). * recur.c (recursive_retrieve): Improved comment; added DEBUGP(). Ignore --convert-links if --delete-after was specified. * retr.c (retrieve_from_file): Just added a DEBUGP(). 2000-10-19 Dan Harkless * ftp.c (ftp_loop_internal): downloaded_file() enumerators changed. (getftp): Applied Piotr Sulecki 's patch to work around FTP servers that incorrectly respond to the "REST" command with the remaining size rather than the total file size. * http.c (gethttp): Improved a comment and added code to tack on ".html" to text/html files without that extension when -E specified. (http_loop): Use new downloaded_file() enumerators and deal with the case of gethttp() called xrealloc() on u->local. * init.c (commands): Added new "htmlextension" command. Also renamed John Daily's cmd_quad() to the more descriptive cmd_lockable_boolean(), alpha-sorted the CMD_DECLARE()s and removed duplicate cmd_boolean() declaration. * main.c (print_help): Added my new -E / --html-extension option. (main): Undocumented --email-address option previously used -E synonym. Stole it away for the much more deserving --html-extension's use. * options.h (struct options): Added html_extension field. * url.c (convert_links): URL X that we saved as X.html locally due to -E needs to be backed up as X.orig, not X.html.orig. Added comments. (downloaded_file): Now remembers if we added .html extension to a file. * url.h (downloaded_file_t): Added extra enumerators to support above. (downloaded_file): Now takes and returns a downloaded_file_t. * wget.h (unnamed "dt" enum): Added ADDED_HTML_EXTENSION enumerator. 2000-10-09 Dan Harkless * html.c (htmlfindurl): Added unneeded initialization to quiet warning. * main.c (print_help): Clarified what --retr-symlinks does. 2000-09-15 John Daily * init.c: Add support for "always" and "never" values to allow .wgetrc to override commandline (useful e.g. with .pm files calling `wget --passive-ftp' when your firewall doesn't allow that). * ftp.c (getftp): passive_ftp is first option to support always/never. 2000-08-30 Dan Harkless * ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define. * html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter. Wrapped some > 80-column lines. When -p is specified and we're at a leaf node, do not traverse , , or tags other than . * html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter. * init.c: Added new -p / --page-requisites / page_requisites option. * main.c (print_help): Clarified that -l inf and -l 0 both allow infinite recursion. Changed the unhelpful --mirrior description to simply give the options it's equivalent to. Added new -p option. (main): Added some comments; handle new -p / --page-requisites. * options.h (struct options): Added new page_requisites field. * recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs". Calculate and pass down new `dash_p_leaf_HTML' parameter to get_urls_html(). Use new INFINITE_RECURSION #define. * retr.c: Changed "URL-s" to "URLs". get_urls_html() now takes final `dash_p_leaf_HTML' parameter. * url.c: get_urls_html() and htmlfindurl() now take final `dash_p_leaf_HTML' parameter. * url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter. * wget.h: Added some comments and new INFINITE_RECURSION #define. 2000-08-23 Dan Harkless * main.c (print_help): -B / --base was not mentioned. 2000-08-22 Dan Harkless * main.c (print_help): Modified -nc description to mention that it also prevents the creation of multiple versions of the same file with "." suffixes. 2000-07-14 Jan Prikryl * retr.c (retrieve_url): Consistently strdup opt.referer when setting u->referer. 2000-06-09 Dan Harkless * main.c (print_help): --help output for --waitretry was over 80 cols. 2000-06-09 Hrvoje Niksic * url.c (encode_string): Fix comment. Suggested by Herold Heiko . 2000-06-01 Const Kaplinsky * ftp.c (ftp_retrieve_list): Change permissions only on plain files. 2000-06-01 Hrvoje Niksic * url.c (str_url): Print the port number only if it's different from the default port number for that protocol. 2000-05-22 Dan Harkless * main.c (print_help): Added --help line for Damir Dzeko 's until-now-undocumented --referer option. Removed comments that --referer and --waitretry were undocumented. Changed "`.wgetrc' command" to "`.wgetrc'-style command" on --help line for --execute. 2000-05-18 Hrvoje Niksic * ftp.c (getftp): Ditto. * http.c (gethttp): Check for return value of fclose/fflush. 2000-04-12 Hrvoje Niksic * host.c (store_hostaddress): Instead of shifting ADDR, start copying from the correct address. 2000-04-12 Hrvoje Niksic * http.c (gethttp): Don't free REQUEST -- it was allocated with alloca(). Pointed out by Gisle Vanem . 2000-04-04 Dan Harkless * host.c (store_hostaddress): R. K. Owen's patch introduces a "left shift count >= width of type" warning on 32-bit architectures. Got rid of it by tricking the compiler w/ a variable. * url.c (UNSAFE_CHAR): The macro didn't include all the illegal characters per RFC1738, namely everything above '~'. It also generated a warning on OSes where char =~ unsigned char. Fixed. 1998-10-17 Hrvoje Niksic * http.c (http_process_type): Removed needless strdup(), a memory leak. 1998-09-25 Hrvoje Niksic * html.c (htmlfindurl): Set PH to the first occurrence of `#'. 1998-09-25 Simon Munton * init.c (wgetrc_file_name): Don't free HOME under Windows. 1998-12-01 "R. K. Owen" * host.c (store_hostaddress): Fix for big endian 64-bit machines. 1998-12-01 Hrvoje Niksic * url.c (UNSAFE_CHAR): New macro. (contains_unsafe): Use it. (encode_string): Ditto. 1998-12-01 Hrvoje Niksic * main.c (i18n_initialize): Use LC_MESSAGES only if available. 2000-03-31 Hrvoje Niksic * Use TOUPPER/TOLOWER. 1998-12-22 Alexander V. Lukyanov * ftp-opie.c (btoe): Zero-terminate OSTORE. 2000-03-21 Hrvoje Niksic * wget.h (DO_REALLOC_FROM_ALLOCA): Ditto. * sysdep.h (ISALNUM): New macro. (TOLOWER): Ditto. (TOUPPER): Ditto. 2000-03-10 Dan Harkless * html.c (idmatch): Implemented checking of my new --follow-tags and --ignore-tags options. * init.c (commands): Added comment reminding people adding new entries doing allocation to add corresponding freeing in cleanup(). (commands): Added new followtags and ignoretags commands. (cleanup): Free storage for new followtags and ignoretags. * main.c: Use of "comma-separated list" was random -- normalized it. Did some alphabetization. Added comments pointing out "Options without arguments" and "Options accepting an argument" sections of long_options[]. Added new options --follow-tags and -G / --ignore-tags. Added comment that Damir's --referer is currently undocumented. Added comment that Heiko's --waitretry is partially undocumented (mentioned in --help but not in wget.texi). Moved improperly sorted 24, 129, and 'G' cases. * options.h (struct options): Added new fields follow_tags and ignore_tags. * wget.h: Added "#define EQ 0" so we can say "strcmp(a, b) == EQ". 2000-03-02 Dan Harkless * ftp.c (ftp_loop_internal): Heiko introduced "suggest explicit braces to avoid ambiguous `else'" warnings. Eliminated them. * http.c (gethttp): Dan Berger's query string patch is totally bogus. If you have two different URLs, gen_page.cgi?page1 and get_page.cgi?page2, they'll both be saved as get_page.cgi and the second will overwrite the first. Also, parameters to implicit CGIs, like "http://www.host.com/db/?2000-03-02" cause the URLs to be printed with trailing garbage characters, and could seg fault. Backing out the patch, which Dan B. informed me by email was just a kludge to download StarOffice from Sun made necessary due to wget's unconditional escaping of certain characters (room for an option there?). (http_loop): Heiko introduced "suggest explicit braces to avoid ambiguous `else'" warnings. Eliminated them. * main.c: Heiko's --wait / --waitretry backwards compatibility code looks to have been totally untested -- automatic variable 'wr' was used without being initialized, and a long int was passed into setval()'s char* val parameter. * recur.c (parse_robots): Applied Edward J. Sabol 's patch for Guan Yang's reported problem with "User-agent:*" lines in robots.txt. * url.c (parseurl, str_url): Removing Dan Berger's code (see http.c above for explanation). 1999-08-25 Heiko Herold * ftp.c: Respect new option waitretry. 2000-01-30 Damir Dzeko * http.c (gethttp): Send custom Referer, if required. 1999-09-24 Charles G Waldman * netrc.c (parse_netrc): Allow passwords to contain spaces. * netrc.c (parse_netrc): New function. 1999-09-17 Dan Berger * http.c (gethttp): Send it. * url.c (parseurl): Detect query string in HTTP URL-s. (str_url): Print it. 2000-03-02 HIROSE Masaaki * html.c (html_allow): Add and