[82757] in North American Network Operators' Group
Re: identical-glyph homographs
daemon@ATHENA.MIT.EDU (Florian Weimer)
Thu Jul 28 17:14:00 2005
From: Florian Weimer <fw@deneb.enyo.de>
To: Todd Vierling <tv@duh.org>
Cc: Jason Sloderbeck <jason@positivenetworks.net>,
Phillip Vandry <vandry@TZoNE.ORG>, nanog@merit.edu
Date: Thu, 28 Jul 2005 23:12:50 +0200
In-Reply-To: <Pine.NEB.4.62.0507281601120.1136@server.duh.org> (Todd
Vierling's message of "Thu, 28 Jul 2005 16:55:05 -0400 (EDT)")
Errors-To: owner-nanog@merit.edu
* Todd Vierling:
>> Homographs are a classical example of a PR attack. It's a complete
>> non-issue. In practice, people don't use domain names to assess the
>> credibility of web sites. 1/l/I and 0/O are homographs as well, and
>> the Internet hasn't collapsed as a result.
>
> English-speaking folks actually do often notice the difference between 1/l/I
> and 0/O, partly because they're usually (in browsers) lower case -- hence
> 1/l/i and 0/o (while 1/l is still close, the users are trained by years to
> know the difference). It's an implicit Turing-test factor based on
> linguistic experience.
But case is controlled by the attacker. Maybe users would be alerted
if they saw a capitalized domain name, which rules out the O/0
replacement. But the l/1/I issue still remains.
> Homographs where the glyphs are almost or completely identical, but
> completely different code points, is where this *really* breaks down. There
> are several sets of glyphs that can mimic nearly all of the Latin alphabet
> -- and in most fonts, looks *identical* to the Latin glyphs (some fonts
> simply remap to use the Latin glyph's data).
So what? For most .DE domain, I still can get the corresponding
.DE.VU domain. Apart from the trailing .VU, the strings are even
bitwise identical.
Let me repeat my other argument: Users don't use domain names in trust
assessments. The smarter ones seem to recall how they got to a
particular page. This is quite consistent with real-world behavior.
Most people tend not to forget that they are in some questionable part
of the city just because they meet an attractive member of the
appropriate sex (or something like that, you get the idea).
> (Hint: In each group of three lines, the strings of characters are NOT
> identical, regardless of what your eyes may tell you.)
They appear differently because even though they are from a single
font, the characters have slightly different widths. This wouldn't
matter in the location field, of course.