From: hniksic Date: Fri, 7 Nov 2003 12:00:23 +0000 (-0800) Subject: [svn] Improve documentation of "reserved" and "unsafe" chars. X-Git-Tag: v1.13~1460 X-Git-Url: http://sjero.net/git/?p=wget;a=commitdiff_plain;h=99625a869bd9fc2c021cdbbff0c4cc7427ea2c95 [svn] Improve documentation of "reserved" and "unsafe" chars. --- diff --git a/src/url.c b/src/url.c index 6b038f17..d4032582 100644 --- a/src/url.c +++ b/src/url.c @@ -76,20 +76,34 @@ static struct scheme_data supported_schemes[] = static int path_simplify PARAMS ((char *)); -/* Support for encoding and decoding of URL strings. We determine - whether a character is unsafe through static table lookup. This - code assumes ASCII character set and 8-bit chars. +/* Support for escaping and unescaping of URL strings. */ - Note that rfc2396 chose a different terminology from rfc1738. The - recoding that URL does should be compliant with both specs, - although escaping the "unsafe" ("unreserved" in rfc2396 parlance) - chars where not strictly necessary is now frowned upon. */ +/* Table of "reserved" and "unsafe" characters. Those terms are + rfc1738-speak, as such largely obsoleted by rfc2396 and later + specs, but the general idea remains. + + A reserved character is the one that you can't decode without + changing the meaning of the URL. For example, you can't decode + "/foo/%2f/bar" into "/foo///bar" because the number and contents of + path components is different. Non-reserved characters can be + changed, so "/foo/%78/bar" is safe to change to "/foo/x/bar". Wget + uses the rfc1738 set of reserved characters, plus "$" and ",", as + recommended by rfc2396. + + An unsafe characters is the one that should be encoded when URLs + are placed in foreign environments. E.g. space and newline are + unsafe in HTTP contexts because HTTP uses them as separator and + terminator, so they must be encoded to %20 and %0A respectively. + "*" is unsafe in shell context, etc. + + We determine whether a character is unsafe through static table + lookup. This code assumes ASCII character set and 8-bit chars. */ enum { - /* rfc1738 reserved chars, preserved from encoding. */ + /* rfc1738 reserved chars + "$" and ",". */ urlchr_reserved = 1, - /* rfc1738 unsafe chars, plus some more. */ + /* rfc1738 unsafe chars, plus non-printables. */ urlchr_unsafe = 2 };