[165301] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: looking for hostname geographic hint validation

daemon@ATHENA.MIT.EDU (Bradley Huffaker)
Wed Aug 28 15:16:32 2013

Date: Wed, 28 Aug 2013 12:16:14 -0700
From: Bradley Huffaker <bhuffake@caida.org>
To: nanog <nanog@nanog.org>
In-Reply-To: <521E1219.3050409@list-subs.com>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

On Wed, Aug 28, 2013 at 04:07:05PM +0100, Ben wrote:
> Dear Bradley,
> 
> So basically you're asking others to do your homework for you ?   ;-)

Actually no, I'm asking people to do something which I can not.  

While it is true I could test against a manual inference, I would simply
be checking one inference against another. Agreement would only prove
that the algorithm does what I expect. Only the operators, who actually
know what they are doing, can give me the ground truth I need to test my
inferences against reality.

> For example, picking one example from your list ....
> 
> <iata>([^a-z]+[a-z]+\d*){3}.ic.ac.uk
>
> Far from being IATA codes, the intermediate subdomains actually refer to 
> departments (DepartmentOfComputing and CHemistry in the two I quoted).
> 
> Sorry to rain on your parade, but someone had to say it.  ;-)

You are most likely right, but I am not looking for perfection.  I am
hoping for an inference that will get me with in 10 km of the actual
city most of the time.

Given the validation I have so far, out of the 19,611 hostnames for which a
location is inferred, and I have validation data, we infer the city
correctly 93% of the time.

While there is work left to do, it is far from the lost cause you
present.

-- 
    the value of a world model is not how accurately it captures reality
    but how often it leads us to take appropriate action


home help back first fref pref prev next nref lref last post