[svn] Improve documentation of "reserved" and "unsafe" chars.

author hniksic <devnull@localhost>

Fri, 7 Nov 2003 12:00:23 +0000 (04:00 -0800)

committer hniksic <devnull@localhost>

Fri, 7 Nov 2003 12:00:23 +0000 (04:00 -0800)
author hniksic <devnull@localhost>
Fri, 7 Nov 2003 12:00:23 +0000 (04:00 -0800)
committer hniksic <devnull@localhost>
Fri, 7 Nov 2003 12:00:23 +0000 (04:00 -0800)
diff --git a/src/url.c b/src/url.c

index 6b038f17f393369027263b1b7b72e08068e3feb5..d40325825c09f64ca58f9078c2c38b30d594766b 100644 (file)
--- a/src/url.c
+++ b/src/url.c
@@ -76,20 +76,34 @@ static struct scheme_data supported_schemes[] =
  
  static int path_simplify PARAMS ((char *));
  \f
-/* Support for encoding and decoding of URL strings.  We determine
-   whether a character is unsafe through static table lookup.  This
-   code assumes ASCII character set and 8-bit chars.
+/* Support for escaping and unescaping of URL strings.  */
  
-   Note that rfc2396 chose a different terminology from rfc1738.  The
-   recoding that URL does should be compliant with both specs,
-   although escaping the "unsafe" ("unreserved" in rfc2396 parlance)
-   chars where not strictly necessary is now frowned upon.  */
+/* Table of "reserved" and "unsafe" characters.  Those terms are
+   rfc1738-speak, as such largely obsoleted by rfc2396 and later
+   specs, but the general idea remains.
+
+   A reserved character is the one that you can't decode without
+   changing the meaning of the URL.  For example, you can't decode
+   "/foo/%2f/bar" into "/foo///bar" because the number and contents of
+   path components is different.  Non-reserved characters can be
+   changed, so "/foo/%78/bar" is safe to change to "/foo/x/bar".  Wget
+   uses the rfc1738 set of reserved characters, plus "$" and ",", as
+   recommended by rfc2396.
+
+   An unsafe characters is the one that should be encoded when URLs
+   are placed in foreign environments.  E.g. space and newline are
+   unsafe in HTTP contexts because HTTP uses them as separator and
+   terminator, so they must be encoded to %20 and %0A respectively.
+   "*" is unsafe in shell context, etc.
+
+   We determine whether a character is unsafe through static table
+   lookup.  This code assumes ASCII character set and 8-bit chars.  */
  
  enum {
-  /* rfc1738 reserved chars, preserved from encoding.  */
+  /* rfc1738 reserved chars + "$" and ",".  */
    urlchr_reserved = 1,
  
-  /* rfc1738 unsafe chars, plus some more.  */
+  /* rfc1738 unsafe chars, plus non-printables.  */
    urlchr_unsafe   = 2
  };
author	hniksic <devnull@localhost>
	Fri, 7 Nov 2003 12:00:23 +0000 (04:00 -0800)
committer	hniksic <devnull@localhost>
	Fri, 7 Nov 2003 12:00:23 +0000 (04:00 -0800)