[45512] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Reducing Usenet Bandwidth

daemon@ATHENA.MIT.EDU (Joe St Sauver)
Sun Feb 3 18:12:04 2002

Date: Sun, 03 Feb 2002 15:11:21 -0800 (PST)
From: Joe St Sauver <JOE@OREGON.UOREGON.EDU>
To: steve@opaltelecom.co.uk
Cc: nanog@merit.edu
Message-id: <01KDU77GIFSI8WXOHJ@OREGON.UOREGON.EDU>
X-VMS-To: IN%"steve@opaltelecom.co.uk"
MIME-version: 1.0
Errors-To: owner-nanog-outgoing@merit.edu


Hi Stephen,

> as we all know Usenet traffic is always increasing, a large number of
>people take full feeds which on my servers is about 35Mb of continuous
>bandwidth in/out. That produces about 300Gb per day of which only a small
>fraction ever gets downloaded.

You should be aware that Usenet traffic loads are extremely sensitive to 
a number of seemingly inconsequential factors. To mention just two of many:

(1) The presence or absence of individual groups (and by absence, I mean 
"poisoning an unwanted group" e.g., dropping any article posted to that
group, *AND* any article that's been *crossposted to* that group). For 
example, consider the top 20 groups from 
http://www.newsadmin.com/top100bytes.htm (commas added for readability):

!Binary Newsgroups                              Bytes        % Total
! 1 alt.chello.binaries                13,757,677,778          4.538
! 2 alt.binaries.vcd                   10,087,744,846          3.327
! 3 alt.binaries.vcd.repost            10,068,663,162          3.321
! 4 alt.binaries.multimedia             9,901,822,387          3.266
! 5 alt.binaries.sounds.mp3             9,159,994,565          3.021
! 6 alt.binaries.cd.image               7,865,347,722          2.594
! 7 alt.binaries.erotica.vcd            7,080,622,563          2.336
! 8 alt.binaries.cd.image.playstation   5,381,405,545          1.775
! 9 alt.binaries.movies.divx            5,004,619,468          1.651
!10 alt.binaries.music.shn              4,935,170,128          1.628
!11 alt.binaries.movies.divx.french     4,919,381,694          1.623
!12 alt.binaries.anime                  4,672,847,011          1.541
!13 alt.binaries.sounds.mp3.complete_cd 4,448,118,991          1.467
!14 alt.binaries.old.games              4,410,750,072          1.455
!15 alt.binaries.multimedia.cartoons    3,898,196,934          1.286
!16 alt.binaries.images                 3,768,957,616          1.243
!17 alt.binaries.mpeg.video.music       3,711,531,880          1.224
!18 alt.binaries.movies                 3,547,393,708          1.170
!19 alt.binaries.cd.image.games         3,219,286,966          1.062
!20 alt.binaries.movies.divx.german     3,194,581,083          1.054

When carriage of a single group can contribute nearly 14GB worth of traffic
to a feed, obviously you should pay attention to what you're carrying. You
can say, "We carry and feed everything" if you like, but remember the fact
that the presence or absence of a -single- group (out of tens or hundreds
of thousands, depending on what you consider to be a valid newsgroup) can 
change your feed traffic by 14GB (nearly 5%) a day.

(2) Your choice of maximum per-article article size in octets.

The 80/20 rule holds. You can get 80+% of all articles at a cost of carrying
only about 20% of all octets. To see this, look at slide 33 of 
http://www.uoregon.edu/~joe/ogig-flow-study.{ppt,pdf} quoting a graph by
the folks at tele.dk...

A couple of "magic values" that you may want to empirically evaluate for your
local server are in the range of 40-50KB/article (if you run a "text only"-
oriented server), or 250-300KB/article (image-oriented binaries plus text
only server). If you are planning on carrying "everything" be sure you don't
inadvertently cap articles at 1MB/article or even 4MB/article -- you'd still
be missing articles if you choose that low a limit.

>a) Internally to a network
>If I site multiple peer servers at exchange and peering points then they
>all exchange traffic, all inter and intra site circuits are filled to the
>above 35Mb level.

Locally, news servers should probably be run on gigabit links; average traffic
may run 35Mbps for a full feed, but you would need additional capacity for 
peaking and recovering from outages, to say nothing of loads associated with
feeds you may be fanning out, or local reader traffic loads. 

If you buy the argument that news servers should be gigabit connected, then
35Mbps worth of traffic in the local area really isn't much worth worrying 
about... 

>b) Externally such as at public peering exchange points
>If theres 100 networks at an exchange point and half exchange a full feed
>thats 35x50x2 = 3500Mb of traffic flowing across the exchange peering LAN.

Usenet's pre-arranged and predictable server-to-server flows make an excellent
"foundation load" when it comes to justifying a decision to participate at an
exchange point, and Usenet has always been an important component of exchange 
point traffic. Consider, for example, the SIX in Seattle -- it is not a 
coincidence that the SIX it is affiliated with Altopia, a Usenet speciality 
service provider.

>For the peering point question I'm thinking some kind of multicast thing,
>internally I've no suggestions other than perhaps only exchanging message
>ids between peer servers, hence giving back a partial feed to the local
>box's external peers.

In the higher education community, deployment of lightly loaded high 
bandwidth networks such as Internet2's Abilene network (see 
http://www.internet2.edu/ ) has largely eliminated concerns about accomodating
Usenet traffic volumes, at least for Usenet traffic between Internet2-
connected institutions... And of course, many I2 schools peer not only with 
each other, but also with local non-I2 Usenet peers, typically via a local 
exchange point, thereby "sharing the wealth" assuming you can accept 
one server's worth of intermediation.

For those who want to dig in and see for themselves, check out:
http://www.itec.oar.net/abilene-netflow/

Note that for 02/02/02, NNTP traffic (port=119) was the hottest application 
on a per-destination-port basis for the aggregation of all I2 network nodes, 
running 12.4% of all octets. Of course, if you change to a per-source-port 
view, Kazaa/Morpheus/FastTrack traffic (port=1214) was running fully twice 
that hot at 25+% of all octets for all network nodes. :-;

Regards,

Joe

home help back first fref pref prev next nref lref last post