[165301] in North American Network Operators' Group
Re: looking for hostname geographic hint validation
daemon@ATHENA.MIT.EDU (Bradley Huffaker)
Wed Aug 28 15:16:32 2013
Date: Wed, 28 Aug 2013 12:16:14 -0700
From: Bradley Huffaker <bhuffake@caida.org>
To: nanog <nanog@nanog.org>
In-Reply-To: <521E1219.3050409@list-subs.com>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
On Wed, Aug 28, 2013 at 04:07:05PM +0100, Ben wrote:
> Dear Bradley,
>
> So basically you're asking others to do your homework for you ? ;-)
Actually no, I'm asking people to do something which I can not.
While it is true I could test against a manual inference, I would simply
be checking one inference against another. Agreement would only prove
that the algorithm does what I expect. Only the operators, who actually
know what they are doing, can give me the ground truth I need to test my
inferences against reality.
> For example, picking one example from your list ....
>
> <iata>([^a-z]+[a-z]+\d*){3}.ic.ac.uk
>
> Far from being IATA codes, the intermediate subdomains actually refer to
> departments (DepartmentOfComputing and CHemistry in the two I quoted).
>
> Sorry to rain on your parade, but someone had to say it. ;-)
You are most likely right, but I am not looking for perfection. I am
hoping for an inference that will get me with in 10 km of the actual
city most of the time.
Given the validation I have so far, out of the 19,611 hostnames for which a
location is inferred, and I have validation data, we infer the city
correctly 93% of the time.
While there is work left to do, it is far from the lost cause you
present.
--
the value of a world model is not how accurately it captures reality
but how often it leads us to take appropriate action