+2008-11-15 Steven Schubiger <stsc@members.fsf.org>
+
+ * sample.wgetrc: Comment the waitretry "default" value,
+ because there is a global one now.
+
+ * wget.texi (Download Options): Mention the global
+ default value.
+
2008-11-10 Micah Cowan <micah@cowan.name>
* Makefile.am (EXTRA_DIST): Removed no-longer-present
* wget.texi (Robot Exclusion): Fixed typo "downloads" ->
"download"
+ 2008-08-03 Xavier Saint <wget@sxav.eu>
+
+ * wget.texi : Add option descriptions for the three new
+ options --iri, --locale and --remote-encoding related to
+ IRI support.
+
+ * sample.wgetrc : Add commented lines for the three new
+ command iri, locale and encoding related to IRI support.
+
2008-08-03 Micah Cowan <micah@cowan.name>
* wget.texi: Don't set UPDATED; already set by version.texi.
# downloads, set waitretry to maximum number of seconds to wait (Wget
# will use "linear backoff", waiting 1 second after the first failure
# on a file, 2 seconds after the second failure, etc. up to this max).
-waitretry = 10
+#waitretry = 10
##
# To try ipv6 addresses first:
#prefer-family = IPv6
+
+ # Set default IRI support state
+ #iri = off
+
+ # Force the default system encoding
+ #locale = UTF-8
+
+ # Force the default remote server encoding
+ #remoteencoding = UTF-8
Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
servers that support the @code{Range} header.
+ @cindex iri support
+ @cindex idn support
+ @item --iri
+
+ Turn on internationalized URI (IRI) support. Use @samp{--iri=no} to
+ turn it off. IRI support is activated by default.
+
+ You can set the default state of IRI support using @code{iri} command in
+ @file{.wgetrc}. That setting may be overridden from the command line.
+
+ @cindex local encoding
+ @cindex locale
+ @item --locale=@var{encoding}
+
+ Force Wget to use @var{encoding} as the default system encoding. That affects
+ how Wget converts URLs specified as arguments from locale to @sc{utf-8} for
+ IRI support.
+
+ Wget use the function @code{nl_langinfo()} and then the @code{CHARSET}
+ environment variable to get the locale. If it fails, @sc{ascii} is used.
+
+ You can set the default locale using the @code{locale} command in
+ @file{.wgetrc}. That setting may be overridden from the command line.
+
@cindex progress indicator
@cindex dot style
@item --progress=@var{type}
``dot'' progress will be favored over ``bar''. To force the bar output,
use @samp{--progress=bar:force}.
+ @cindex remote encoding
+ @item --remote-encoding=@var{encoding}
+
+ Force Wget to use encoding as the default remote server encoding. That
+ affects how Wget converts URIs found in files from remote encoding to
+ @sc{utf-8} during a recursive fetch. This options is only useful for
+ IRI support, for the interpretation of non-@sc{ascii} characters.
+
+ For HTTP, remote encoding can be found in HTTP @code{Content-Type}
+ header and in HTML @code{Content-Type http-equiv} meta tag.
+
+ You can set the default encoding using the @code{remoteencoding}
+ command in @file{.wgetrc}. That setting may be overridden from the
+ command line.
+
@item -N
@itemx --timestamping
Turn on time-stamping. @xref{Time-Stamping}, for details.
given file, then waiting 2 seconds after the second failure on that
file, up to the maximum number of @var{seconds} you specify. Therefore,
a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55
-seconds per file.
+seconds per file.
-Note that this option is turned on by default in the global
-@file{wgetrc} file.
+By default, Wget will assume a value of 10 seconds.
@cindex wait, random
@cindex random wait
* http.c (gethttp): Don't do anything when content-length >= our
requested range.
+2008-11-16 Steven Schubiger <stsc@members.fsf.org>
+
+ * main.c: Declare and initialize the numurls counter.
+
+ * ftp.c, http.c: Make the counter visible here and use it.
+
+ * options.h: Remove old declaration from options struct.
+
+2008-11-15 Steven Schubiger <stsc@members.fsf.org>
+
+ * init.c (defaults): Set default waitretry value.
+
+2008-11-14 Steven Schubiger <stsc@members.fsf.org>
+
+ * main.c (format_and_print_line): Use a custom format
+ string for printing leading spaces.
+
2008-11-12 Micah Cowan <micah@cowan.name>
* ftp-ls.c (ftp_index): HTML-escape dir name in title, h1, a:href.
* init.c (cleanup): Free the memory associated with the base
option (when DEBUG_MALLOC is defined).
+ 2008-07-02 Xavier Saint <wget@sxav.eu>
+
+ * iri.c, iri.h : New function idn_decode() to decode ASCII
+ encoded hostname to the locale.
+
+ * host.c : Show hostname to be resolved both in locale and
+ ASCII encoded.
+
2008-06-28 Steven Schubiger <stsc@members.fsf.org>
* retr.c (retrieve_from_file): Allow for reading the links from
an external file (HTTP/FTP).
+ 2008-06-26 Xavier Saint <wget@sxav.eu>
+
+ * iri.c, iri.h : New functions locale_to_utf8() and
+ idn_encode() adding basic capabilities of IRI/IDN.
+
+ * url.c : Convert URLs from locale to UTF-8 allowing a basic
+ support of IRI/IDN
+
2008-06-25 Steven Schubiger <stsc@members.fsf.org>
* ftp.c (getftp): When spidering a FTP URL, emit a diagnostic
* http.c: Make -nv --spider include the file's name when it
exists.
-
+
2008-06-22 Micah Cowan <micah@cowan.name>
* Makefile.am (version.c): Fixed version string invocation so it
string vars pointers-to-const, and moved line lengths
below 80 (in Makefile.am, not in version.c).
+ 2008-06-19 Xavier Saint <wget@sxav.eu>
+
+ * iri.c, iri.h : New function check_encoding_name() as
+ a preliminary encoding name check.
+
+ * main.c, iri.c : Make use of check_encoding_name().
+
+ 2008-06-19 Xavier Saint <wget@sxav.eu>
+
+ * iri.c : Include missing stringprep.h file and add a
+ cast.
+
+ * init.c : set a default initial value for opt.enable_iri,
+ opt.locale and opt.encoding_remote.
+
+ 2008-06-19 Xavier Saint <wget@sxav.eu>
+
+ * iri.c, iri.h : Add a new function find_locale() to find
+ out the local system encoding.
+
+ * main.c : Make use of find_locale().
+
+ 2008-06-19 Xavier Saint <wget@sxav.eu>
+
+ * html-url.c : Add "content-type" meta tag parsing for
+ retrieving page encoding.
+
+ * iri.h : Make no-op version of parse_charset() return
+ NULL.
+
2008-06-16 Micah Cowan <micah@cowan.name>
* http.c (http_loop): When hstat.len is higher than the
successfully completed content's length, but it's because we
_set_ it that way, don't abort.
+ 2008-06-14 Xavier Saint <wget@sxav.eu>
+
+ * iri.c, iri.h : New files.
+
+ * Makefile.am : Add files iri.h and conditional iri.c.
+
+ * build_info.c : Add compiled feature "iri".
+
+ * http.c : include iri.h and parse charset from Content-Type
+ header.
+
+ * init.c, main.c, options.h : if an options isn't supported
+ at compiled time, don't get rid off it and show a dummy
+ message instead if they are used.
+
2008-06-13 Micah Cowan <micah@cowan.name>
* build_info.c: ENABLE_NTLM, not HAVE_NTLM; distinguish OpenSSL
default.
2008-05-17 Kenny Parnell <k.parnell@gmail.com>
-
+
(cmd_spec_prefer_family): Initialize prefer_family to prefer_none.
2008-05-17 Micah Cowan <micah@cowan.name>
-
+
* main.c (main): Handle Ctrl-D on command-line.
2008-05-15 Steven Schubiger <schubiger@gmail.com>
* options.h: Add an according boolean member to the options
struct.
-
+
* sysdep.h: Comment the defines __EXTENSIONS__ and _GNU_SOURCE
out, because they're now defined independently by config.h.
int hcount, hcapacity;
};
+extern int numurls;
+
/* Create a new, empty request. At least request_set_method must be
called before the request can be used. */
If PROXY is non-NULL, the connection will be made to the proxy
server, and u->url will be requested. */
static uerr_t
- gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy)
+ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
+ struct iri *iri)
{
struct request *req;
hs->local_file = url_file_name (u);
}
}
-
+
/* TODO: perform this check only once. */
if (!hs->existence_checked && file_exists_p (hs->local_file))
{
local_dot_orig_file_exists = true;
local_filename = filename_plus_orig_suffix;
}
- }
+ }
if (!local_dot_orig_file_exists)
/* Couldn't stat() <file>.orig, so try to stat() <file>. */
char *tmp = strchr (type, ';');
if (tmp)
{
+ /* sXXXav: only needed if IRI support is enabled */
+ char *tmp2 = tmp + 1;
+
while (tmp > type && c_isspace (tmp[-1]))
--tmp;
*tmp = '\0';
+
+ /* Try to get remote encoding if needed */
+ if (opt.enable_iri && !opt.encoding_remote)
+ {
+ tmp = parse_charset (tmp2);
+ if (tmp)
+ set_content_encoding (iri, tmp);
+ }
}
}
hs->newloc = resp_header_strdup (resp, "Location");
retried, and retried, and retried, and... */
uerr_t
http_loop (struct url *u, char **newloc, char **local_file, const char *referer,
- int *dt, struct url *proxy)
+ int *dt, struct url *proxy, struct iri *iri)
{
int count;
bool got_head = false; /* used for time-stamping and filename detection */
uerr_t err, ret = TRYLIMEXC;
time_t tmr = -1; /* remote time-stamp */
struct http_stat hstat; /* HTTP status */
- struct_stat st;
+ struct_stat st;
bool send_head_first = true;
/* Assert that no value for *LOCAL_FILE was passed. */
assert (local_file == NULL || *local_file == NULL);
-
+
/* Set LOCAL_FILE parameter. */
if (local_file && opt.output_document)
*local_file = HYPHENP (opt.output_document) ? NULL : xstrdup (opt.output_document);
-
+
/* Reset NEWLOC parameter. */
*newloc = NULL;
retrieve the file. But if the output_document was given, then this
test was already done and the file didn't exist. Hence the !opt.output_document */
logprintf (LOG_VERBOSE, _("\
- File %s already there; not retrieving.\n\n"),
+ File %s already there; not retrieving.\n\n"),
quote (hstat.local_file));
/* If the file is there, we suppose it's retrieved OK. */
*dt |= RETROKF;
/* Reset the counter. */
count = 0;
-
+
/* Reset the document type. */
*dt = 0;
-
+
/* Skip preliminary HEAD request if we're not in spider mode AND
* if -O was given or HTTP Content-Disposition support is disabled. */
if (!opt.spider
/* Send preliminary HEAD request if -N is given and we have an existing
* destination file. */
- if (opt.timestamping
+ if (opt.timestamping
&& !opt.content_disposition
&& file_exists_p (url_file_name (u)))
send_head_first = true;
-
+
/* THE loop */
do
{
/* Increment the pass counter. */
++count;
sleep_between_retrievals (count);
-
+
/* Get the current time string. */
tms = datetime_str (time (NULL));
-
+
if (opt.spider && !got_head)
logprintf (LOG_VERBOSE, _("\
Spider mode enabled. Check if remote file exists.\n"));
if (opt.verbose)
{
char *hurl = url_string (u, URL_AUTH_HIDE_PASSWD);
-
- if (count > 1)
+
+ if (count > 1)
{
char tmp[256];
sprintf (tmp, _("(try:%2d)"), count);
logprintf (LOG_NOTQUIET, "--%s-- %s %s\n",
tms, tmp, hurl);
}
- else
+ else
{
logprintf (LOG_NOTQUIET, "--%s-- %s\n",
tms, hurl);
}
-
+
#ifdef WINDOWS
ws_changetitle (hurl);
#endif
/* Default document type is empty. However, if spider mode is
on or time-stamping is employed, HEAD_ONLY commands is
encoded within *dt. */
- if (send_head_first && !got_head)
+ if (send_head_first && !got_head)
*dt |= HEAD_ONLY;
else
*dt &= ~HEAD_ONLY;
*dt &= ~SEND_NOCACHE;
/* Try fetching the document, or at least its head. */
- err = gethttp (u, &hstat, dt, proxy);
+ err = gethttp (u, &hstat, dt, proxy, iri);
/* Time? */
tms = datetime_str (time (NULL));
-
+
/* Get the new location (with or without the redirection). */
if (hstat.newloc)
*newloc = xstrdup (hstat.newloc);
hstat.statcode);
ret = WRONGCODE;
}
- else
+ else
{
ret = NEWLOCATION;
}
/* All possibilities should have been exhausted. */
abort ();
}
-
+
if (!(*dt & RETROKF))
{
char *hurl = NULL;
continue;
}
/* Maybe we should always keep track of broken links, not just in
- * spider mode. */
- else if (opt.spider)
+ * spider mode.
+ * Don't log error if it was UTF-8 encoded because we will try
+ * once unencoded. */
+ else if (opt.spider && !iri->utf8_encode)
{
/* #### Again: ugly ugly ugly! */
- if (!hurl)
+ if (!hurl)
hurl = url_string (u, URL_AUTH_HIDE_PASSWD);
nonexisting_url (hurl);
logprintf (LOG_NOTQUIET, _("\
else
{
logprintf (LOG_NOTQUIET, _("%s ERROR %d: %s.\n"),
- tms, hstat.statcode,
+ tms, hstat.statcode,
quotearg_style (escape_quoting_style, hstat.error));
}
logputs (LOG_VERBOSE, "\n");
number_to_static_string (hstat.contlen),
hstat.local_file, count);
}
- ++opt.numurls;
+ ++numurls;
total_downloaded_bytes += hstat.len;
/* Remember that we downloaded the file for later ".orig" code. */
tms, u->url, number_to_static_string (hstat.len),
hstat.local_file, count);
}
- ++opt.numurls;
+ ++numurls;
total_downloaded_bytes += hstat.len;
/* Remember that we downloaded the file for later ".orig" code. */
{ "inet6only", &opt.ipv6_only, cmd_boolean },
#endif
{ "input", &opt.input_filename, cmd_file },
+ { "iri", &opt.enable_iri, cmd_boolean },
{ "keepsessioncookies", &opt.keep_session_cookies, cmd_boolean },
{ "limitrate", &opt.limit_rate, cmd_bytes },
{ "loadcookies", &opt.cookies_input, cmd_file },
+ { "locale", &opt.locale, cmd_string },
{ "logfile", &opt.lfilename, cmd_file },
{ "login", &opt.ftp_user, cmd_string },/* deprecated*/
{ "maxredirect", &opt.max_redirect, cmd_number },
{ "referer", &opt.referer, cmd_string },
{ "reject", &opt.rejects, cmd_vector },
{ "relativeonly", &opt.relative_only, cmd_boolean },
+ { "remoteencoding", &opt.encoding_remote, cmd_string },
{ "removelisting", &opt.remove_listing, cmd_boolean },
{ "restrictfilenames", NULL, cmd_spec_restrict_file_names },
{ "retrsymlinks", &opt.retr_symlinks, cmd_boolean },
opt.max_redirect = 20;
+ opt.waitretry = 10;
++
+ #ifdef ENABLE_IRI
+ opt.enable_iri = true;
+ #else
+ opt.enable_iri = false;
+ #endif
+ opt.locale = NULL;
+ opt.encoding_remote = NULL;
}
\f
/* Return the user's home directory (strdup-ed), or NULL if none is
#endif
const char *exec_name;
+
+/* Number of successfully downloaded URLs */
+int numurls = 0;
\f
#ifndef TESTING
/* Initialize I18N/L10N. That amounts to invoking setlocale, and
{ "inet6-only", '6', OPT_BOOLEAN, "inet6only", -1 },
#endif
{ "input-file", 'i', OPT_VALUE, "input", -1 },
+ { "iri", 0, OPT_BOOLEAN, "iri", -1 },
{ "keep-session-cookies", 0, OPT_BOOLEAN, "keepsessioncookies", -1 },
{ "level", 'l', OPT_VALUE, "reclevel", -1 },
{ "limit-rate", 0, OPT_VALUE, "limitrate", -1 },
{ "load-cookies", 0, OPT_VALUE, "loadcookies", -1 },
+ { "locale", 0, OPT_VALUE, "locale", -1 },
{ "max-redirect", 0, OPT_VALUE, "maxredirect", -1 },
{ "mirror", 'm', OPT_BOOLEAN, "mirror", -1 },
{ "no", 'n', OPT__NO, NULL, required_argument },
{ "referer", 0, OPT_VALUE, "referer", -1 },
{ "reject", 'R', OPT_VALUE, "reject", -1 },
{ "relative", 'L', OPT_BOOLEAN, "relativeonly", -1 },
+ { "remote-encoding", 0, OPT_VALUE, "remoteencoding", -1},
{ "remove-listing", 0, OPT_BOOLEAN, "removelisting", -1 },
{ "restrict-file-names", 0, OPT_BOOLEAN, "restrictfilenames", -1 },
{ "retr-symlinks", 0, OPT_BOOLEAN, "retrsymlinks", -1 },
token on the next line. */
if (remaining_chars <= strlen (token))
{
- int j;
- printf ("\n");
- j = 0;
- for (j = 0; j < leading_spaces; j++)
- {
- printf (" ");
- }
+ printf ("\n%*c", leading_spaces, ' ');
remaining_chars = line_length - leading_spaces;
}
printf ("%s ", token);
exit (1);
}
+ #ifdef ENABLE_IRI
+ if (opt.enable_iri)
+ {
+ if (opt.locale && !check_encoding_name (opt.locale))
+ opt.locale = NULL;
+
+ if (!opt.locale)
+ opt.locale = find_locale ();
+
+ if (opt.encoding_remote && !check_encoding_name (opt.encoding_remote))
+ opt.encoding_remote = NULL;
+ }
+ #else
+ if (opt.enable_iri || opt.locale || opt.encoding_remote)
+ {
+ /* sXXXav : be more specific... */
+ printf(_("This version does not have support for IRIs\n"));
+ exit(1);
+ }
+ #endif
+
if (opt.ask_passwd)
{
opt.passwd = prompt_for_password ();
int old_follow_ftp = opt.follow_ftp;
/* Turn opt.follow_ftp on in case of recursive FTP retrieval */
- if (url_scheme (*t) == SCHEME_FTP)
+ if (url_scheme (*t) == SCHEME_FTP)
opt.follow_ftp = 1;
-
- status = retrieve_tree (*t);
+
+ status = retrieve_tree (*t, NULL);
opt.follow_ftp = old_follow_ftp;
}
else
- status = retrieve_url (*t, &filename, &redirected_URL, NULL, &dt, opt.recursive);
+ {
+ struct iri *i = iri_new ();
+ set_uri_encoding (i, opt.locale, true);
+ status = retrieve_url (*t, &filename, &redirected_URL, NULL, &dt,
+ opt.recursive, i);
+ iri_free (i);
+ }
if (opt.delete_after && file_exists_p(filename))
{
logprintf (LOG_NOTQUIET,
_("FINISHED --%s--\nDownloaded: %d files, %s in %s (%s)\n"),
datetime_str (time (NULL)),
- opt.numurls,
+ numurls,
human_readable (total_downloaded_bytes),
secs_to_human_time (total_download_time),
retr_rate (total_downloaded_bytes, total_download_time));
SUM_SIZE_INT quota; /* Maximum file size to download and
store. */
- int numurls; /* Number of successfully downloaded
- URLs #### should be removed because
- it's not a setting, but a global var */
-
bool server_response; /* Do we print server response? */
bool save_headers; /* Do we save headers together with
file? */
bool content_disposition; /* Honor HTTP Content-Disposition header. */
bool auth_without_challenge; /* Issue Basic authentication creds without
waiting for a challenge. */
+
+ bool enable_iri;
+ char *encoding_remote;
+ char *locale;
};
extern struct options opt;
+ 2008-11-26 Micah Cowan <micah@cowan.name> (not copyrightable)
+
+ * Test-ftp-iri-disabled.px, Test-ftp-iri-fallback.px,
+ Test-ftp-iri.px, Test-idn-cmd.px, Test-idn-headers.px,
+ Test-idn-meta.px, Test-iri-disabled.px,
+ Test-iri-forced-remote.px, Test-iri-list.px, Test-iri.px: More
+ module-scope warnings.
+
+2008-11-25 Steven Schubiger <stsc@members.fsf.org>
+
+ * WgetTest.pm.in: Remove the magic interpreter line;
+ replace -w with lexical warnings.
+
+2008-11-13 Steven Schubiger <stsc@members.fsf.org>
+
+ * FTPServer.pm, FTPTest.pm, HTTPServer.pm, HTTPTest.pm,
+ WgetTest.pm.in: Clean up leftover whitespace.
+
2008-11-12 Steven Schubiger <stsc@members.fsf.org>
* Test-auth-basic.px, Test-auth-no-challenge.px,
* run-px: Use strict (thanks Steven Schubiger!).
+ 2008-09-09 Micah Cowan <micah@cowan.name>
+
+ * Test-idn-cmd.px: Added.
+
+ * run-px: Added Test-idn-cmd.px.
+
+ 2008-08-28 Micah Cowan <micah@cowan.name>
+
+ * HTTPServer.pm (run): Allow distinguishing between hostnames,
+ when used as a proxy.
+
+ * Test-idn-headers.px, Test-idn-meta.px: Added.
+
+ * run-px: Added Test-idn-headers.px, Test-idn-meta.px.
+
+ * Test-proxy-auth-basic.px: Use the full URL, rather than just the
+ path (made necessary by the accompanying change to HTTPServer.pm).
+
+ 2008-08-14 Xavier Saint <wget@sxav.eu>
+
+ * Test-iri-list.px : Fetch files from a remote list.
+
+ 2008-08-03 Xavier Saint <wget@sxav.eu>
+
+ * Test-iri.px : HTTP recursive fetch for testing IRI support and
+ fallback.
+
+ * Test-iri-disabled.px : Same file structure as Test-iri.px but with
+ IRI support disabled
+
+ * Test-iri-forced-remote.px : There's a difference between ISO-8859-1
+ and ISO-8859-15 for character 0xA4 (respectively currency sign and
+ euro sign). So with a forced ISO-8859-1 remote encoding, wget should
+ see 0xA4 as a currency sign and transcode it correctly in UTF-8 instead
+ of using the ISO-8859-15 given by the server.
+
+ * Test-ftp-iri.px : Give a file to fetch via FTP in a specific locale
+ and expect wget to fetch the file UTF-8 encoded.
+
+ * Test-ftp-iri-fallback.px : Same as above but wget should fallback on
+ locale encoding to fetch the file.
+
+ * Test-ftp-iri.px : Same as Test-ftp-iri.px but with IRI support
+ disabled. The UTF-8 encoded file should not be retrieved.
+
2008-06-22 Micah Cowan <micah@cowan.name>
* Test-proxied-https-auth.px: Shift exit code so it falls in the
if (!$initialized) {
$synch_callback->();
$initialized = 1;
- }
+ }
my $con = $self->accept();
print STDERR "Accepted a new connection\n" if $log;
while (my $req = $con->get_request) {
- my $url_path = $req->url->path;
+ #my $url_path = $req->url->path;
+ my $url_path = $req->url->as_string;
if ($url_path =~ m{/$}) { # append 'index.html'
$url_path .= 'index.html';
}
if (exists($urls->{$url_path})) {
print STDERR "Serving requested URL: ", $url_path, "\n" if $log;
next unless ($req->method eq "HEAD" || $req->method eq "GET");
-
+
my $url_rec = $urls->{$url_path};
$self->send_response($req, $url_rec, $con);
} else {
print STDERR "Requested wrong URL: ", $url_path, "\n" if $log;
$con->send_error($HTTP::Status::RC_FORBIDDEN);
last;
- }
+ }
}
print STDERR "Closing connection\n" if $log;
$con->close;