[20924] in bugtraq

home help back first fref pref prev next nref lref last post

RE: Webtrends HTTP Server %20 bug

daemon@ATHENA.MIT.EDU (Glynn Clements)
Fri Jun 8 15:04:02 2001

From: Glynn Clements <glynn.clements@virgin.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15136.19421.750460.506476@cerise.nosuchdomain.co.uk>
Date: Fri, 8 Jun 2001 04:51:57 +0100
To: "Eric Hacker" <hacker@vudu.net>
Cc: "H D Moore" <hdm@secureaustin.com>, "Auriemma Luigi" <kaino3@genie.it>,
        <BUGTRAQ@securityfocus.com>
In-Reply-To: <LAEHJPHOHCJHCDCHEEJAMENGEIAA.hacker@vudu.net>


Eric Hacker wrote:

> Unicode is a superset of ACSII and thus all ASCII characters are Unicode.
> UTF8 is a way of encoding unicode code points for transport over the
> internet in a restricted character set. Conveniently, UTF8 uses the same
> values as ASCII for ASCII representation. Above the standard ASCII 127
> character representation, UTF8 uses multi-byte strings beginning with 0xC1.

No; the sequences for codes 128 to 255 begin with 0xC2 and 0xC3
(128-191 and 192-255 respectively). 0xC0 and 0xC1 indicate (illegal)
overlong encodings of 0-63 and 64-127 respectively.

In general, the two-byte sequences have the (binary) form:

   110xxxxx 10xxxxxx

The range 0-127 (which must use the single-byte form instead)
corresponds to:

   1100000x 10xxxxxx

Hence, any sequence beginning with 11000000 (0xC0) or 11000001 (0xC1)
is illegal.

-- 
Glynn Clements <glynn.clements@virgin.net>

home help back first fref pref prev next nref lref last post