[185673] in North American Network Operators' Group
Re: Long-haul 100Mbps EPL circuit throughput issue
daemon@ATHENA.MIT.EDU (Greg Foletta)
Thu Nov 5 22:14:29 2015
X-Original-To: nanog@nanog.org
In-Reply-To: <20151105231912.GA17090@Mail.DDoS-Mitigator.net>
Date: Fri, 6 Nov 2015 10:35:13 +1100
From: Greg Foletta <greg@foletta.org>
To: alvin nanog <nanogml@mail.ddos-mitigator.net>
Cc: nanog@nanog.org, Eric Dugas <edugas@unknowndevice.ca>
Errors-To: nanog-bounces@nanog.org
Along with recv window/buffer which is needed for your particular
bandwidth/delay product, it appears you're also seeing TCP moving from
slow-start to a congestion avoidance mechanism (Reno, Tahoe, CUBIC etc).
Greg Foletta
greg@foletta.org
On 6 November 2015 at 10:19, alvin nanog <nanogml@mail.ddos-mitigator.net>
wrote:
>
> hi eric
>
> On 11/05/15 at 04:48pm, Eric Dugas wrote:
> ...
> > Linux test machine in customer's VRF <-> SRX100 <-> Carrier CPE (Cisco
> > 2960G) <-> Carrier's MPLS network <-> NNI - MX80 <-> Our MPLS network <->
> > Terminating edge - MX80 <-> Distribution switch - EX3300 <-> Linux test
> > machine in customer's VRF
> >
> > We can full the link in UDP traffic with iperf but with TCP, we can reach
> > 80-90% and then the traffic drops to 50% and slowly increase up to 90%.
>
> if i was involved with these tests, i'd start looking for "not enough tcp
> send
> and tcp receive buffers"
>
> for flooding at 100Mbit/s, you'd need about 12MB buffers ...
>
> udp does NOT care too much about dropped data due to the buffers,
> but tcp cares about "not enough buffers" .. somebody resend packet#
> 1357902456 :-)
>
> at least double or triple the buffers needed to compensate for all kinds of
> network whackyness:
> data in transit, misconfigured hardware-in-the-path, misconfigured iperfs,
> misconfigured kernels, interrupt handing, etc, etc
>
> - how many "iperf flows" are you also running ??
> - running dozen's or 100's of them does affect thruput too
>
> - does the same thing happen with socat ??
>
> - if iperf and socat agree with network thruput, it's the hw somewhere
>
> - slowly increasing thruput doesn't make sense to me ... it sounds like
> something is cacheing some of the data
>
> magic pixie dust
> alvin
>
> > Any one have dealt with this kind of problem in the past? We've tested by
> > forcing ports to 100-FD at both ends, policing the circuit on our side,
> > called the carrier and escalated to L2/L3 support. They tried to also
> > police the circuit but as far as I know, they didn't modify anything
> else.
> > I've told our support to make them look for underrun errors on their
> Cisco
> > switch and they can see some. They're pretty much in the same boat as us
> > and they're not sure where to look at.
> >
>