[138791] in cryptography@c2.net mail archive

home help back first fref pref prev next nref lref last post

Re: e-gold and e-go1d

daemon@ATHENA.MIT.EDU (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Sat Nov 29 16:34:47 2008

Cc: cryptography@metzdowd.com
From: =?UTF-8?Q?Ivan_Krsti=C4=87?= <krstic@solarsail.hcs.harvard.edu>
To: James A. Donald <jamesd@echeque.com>
In-Reply-To: <4930FADC.3060403@echeque.com>
Date: Sat, 29 Nov 2008 21:51:18 +0100

On Nov 29, 2008, at 9:18 AM, James A. Donald wrote:
> The algorithm is to map all lookalike glyphs to
> canonical glyphs

The definition of lookalike glyphs depends on the choice of font and =20
variant, and Unicode wraps the whole problem in a lovely layer of =20
hell. If I had to do this, I'd investigate rendering both strings in =20
the (same) target font and then quantifying the amount of overlap in =20
the bitmaps, as e.g. SWORD does for TLDs:

     <http://icann.sword-group.com/icann-algorithm/Default.aspx>

The above is proprietary; NIST's Paul Black has Python code available =20=

for a slightly enhanced Levenshtein distance:

     <http://hissa.nist.gov/~black/GTLD/>

--
Ivan Krsti=C4=87 <krstic@solarsail.hcs.harvard.edu> | http://radian.org

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo@metzdowd.com

home help back first fref pref prev next nref lref last post