[93463] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: How to get a list of research and academic ISP ?

daemon@ATHENA.MIT.EDU (Marshall Eubanks)
Mon Nov 20 16:01:03 2006

In-Reply-To: <000001c70ce0$5f44a3d0$9054b280@ic.intranet.epfl.ch>
Cc: <nanog@merit.edu>
From: Marshall Eubanks <tme@multicasttech.com>
Date: Mon, 20 Nov 2006 15:59:29 -0500
To: "Maciej Kurant" <maciej.kurant@epfl.ch>
Errors-To: owner-nanog@merit.edu


Hello;

On Nov 20, 2006, at 3:13 PM, Maciej Kurant wrote:

> Dear All,
>
>
>
>
>
> Thank you very much for numerous and quick replies for my email. I =20
> must say that nanog list is really highly responsive.
>
>
>
> I needed some time to digest your comments and try some new ideas. =20
> I share the preliminary results with you now, begging for further =20
> comments.
>
>
>
> The problem was (and still is) to find a good heuristic to =20
> distinguish between commercial (COM) and educational/research/=20
> academic (EDU) ASes.
>
>

I would suggest you need to think a little about what exactly you want

- a list of _all_ academic ASN ?  (that will be tough, and you will =20
have to deal with corner cases, and you will not fully automate it)
- a list of _some_ academic ASN ? (you have that now - so are you =20
worried about completeness or size or ... ?)
- a list of _no_ academic ASN ? (again, this will be tough)
or something else ?

Note, too, that these lists will change with time.

> *EDU_Abilene*
>
> My first approach (see my original email) was to extract a list of =20
> all destinations announced by Abilene. (The assumption is that =20
> Abilene generally does not announce commercial prefixes.) This =20
> results in a list, call it =93EDU_Abilene=94, of 1333 ASes.
>
>
>
>
> *EDU_description*
>
> Some of you suggested looking at the names and descriptions of =20
> ASes. I used the AS list available at:
>
> http://www.multicasttech.com/status/asn_expand.txt
>
> and searched the last column ("Organization") for the following =20
> strings:
>
> "Universit|Univerz|Universida|research|education|science|scientif|=20
> academic|college|institut|laborator|school|ecole|
>
> edu|R&D|library|academy|Etudes"
>
> This approach finds 1796 "educational" ASes, call this set =20
> =93EDU_description=94.
>
>
>
> Of course, these two lists overlap, but less than I expected. In =20
> particular:
>
> len(EDU_Abilene)=3D1333
>
> len(EDU_description)=3D1796
>
> union(EDU_Abilene, EDU_description)=3D2269
>
> intersection(EDU_Abilene, EDU_description)=3D860
>
>
>
>
>
> For many reasons, these lists are far from being very precise. For =20
> instance EDU_Abilene contains AS 7132 (AT&T) and AS 8075 =20
> (Microsoft). Therefore I need further data sets or filtering =20
> methodology. This raises some questions:
>
>
>
> 1) What other EDU networks (preferably with BGP tables available in =20=

> the web) can I take as examples of ASes that (generally) do not =20
> announce commercial prefixes? Based on them I could construct lists =20=

> similar in spirit to EDU_Abilene. I guess, the more the better.
>
>

There are lots - look at the ones that Abilene peers with

http://international.internet2.edu/partners/
http://abilene.internet2.edu/peernetworks/international.html



> 2) Do you know of other lists, similar to http://=20
> www.multicasttech.com/status/asn_expand.txt  ? Maybe a longer =20
> description or a www related to an AS would help the method I use =20
> to create EDU_description. Do you think the strings I use in my =20
> search are appropriate?
>
>
Try
http://bgp.potaroo.net/as1221/asnames.txt

Note that there are errors all over the place here; these lists will =20
not agree perfectly.
My lists come from the rwhois data, but I correct for obvious errors =20
(some of which I have
sent back to the list maintainers). There are others I am sure that I =20=

have not caught, and my corrections are undoubtedly not perfect. I am
sure that the other maintainers of such lists could tell similar tales.

You could start polling rwhois yourself, and I would in doubtful cases.

>
>
> *AS relationships*
>
> Another approach is to exploit the AS relationships. Most of you =20
> agree that usually EDU ASes are not providers for COM customers. =20
> This suggests a way to detect false positives in EDU_Abilene and =20
> EDU_description (or in their union). For every EDU node check how =20
> many COM customers it has, i.e., EDU provider --- COM customer =20
> relationship. I used the AS graphs with inferred relationships =20
> provided by CAIDA (http://as-rank.caida.org/data/2006/). This =20
> method works well to find good candidates for false positive, but =20
> they should not be blindly accepted. For instance AS 7132 (AT&T) =20
> has the highest number of COM customers (615) and should obviously =20
> belong to COM (it is a member of EDU_Abilene). In contrast, a big =20
> component of the EDU backbone, AS 11537 (Abilene) has 66 COM =20
> customers! In general there are about 50 EDU nodes with more than =20
> 10 COM customers each.
>
>

Not a bad approach.
>
>
> 3) What other =93automatic=94 or =93manual=94 approaches would you =
suggest? =20
> Or improvements of the ones just described?


Again, I don't know what you are trying to do. What I have found =20
useful is what you are doing - make lots of lists, and cross =20
reference, and
see what passes multiple tests.
>
>
>
>
> I will appreciate even the briefest comments and suggestions,
>
> Maciej Kurant
>
>
>
>

Hope this helps.

Regards
Marshall

>
>
> From: Maciej Kurant [mailto:maciej.kurant@epfl.ch]
> Sent: mercredi, 15. novembre 2006 18:46
> To: 'nanog@merit.edu'
> Subject: How to get a list of research and academic ISP ?
>
>
>
> Dear all,
>
>
>
> I am a PhD student at EPFL, Switzerland. My recent research =20
> interest is in large scale differences between the commercial and =20
> academic parts of the Internet.
>
>
>
> Of course, in order to perform this kind of studies I need a way to =20=

> distinguish between these two worlds. I=92ve learnt that Abilene does =20=

> not provide commercial connectivity. This means that BGP prefixes =20
> and AS paths announced by Abilene BGP routers should lead only to =20
> research and academic destinations. I have extracted (from the BGP =20
> tables at http://abilene.internet2.edu/observatory) a list of all =20
> such destinations and obtained 1333 ASes (for data form July 2006). =20=

> The number looks reasonable, but I would like to be sure that I am =20
> not making a mistake. Therefore I would be grateful if you could =20
> answer the following questions:
>
>
>
> 1)       Is this approach to obtain a list of research and academic =20=

> ISPs correct?
>
> 2)       Do you maybe know of such lists compiled before?
>
> 3)       If I keep not only the destination ASes, but also all ASes =20=

> on the AS paths towards these destination I obtain a list of about =20
> 1400 ASes. How should I understand this? Does it mean that some =20
> research and academic destinations are reachable from Abilene only =20
> by traversing the commercial Internet?
>
> 4)       Of course, research and academic ASes are often well =20
> connected to the commercial Internet. My guess is that in most =20
> cases their peering relationship is =93customer-provider=94, where =20
> commercial ASes are providers. Is it possible that an academic AS =20
> is a provider for some commercial ASes? If so, does it happen often?
>
>
>
> Thank you in advance for your comments.
>
> Maciej Kurant
>
>
>
>
>
>
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
>
>
> EPFL IC ISC LCA3
>
> Maciej Kurant
>
> PhD Student
>
> CH-1015 Lausanne, Switzerland
>
>
>
> web site:  http://lcawww.epfl.ch/kurant
>
>
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
>
>
>


home help back first fref pref prev next nref lref last post