[3009] in testers
[Jason Merrill: Re: PPP slow between Linux and Solaris - why is it so?
daemon@ATHENA.MIT.EDU (Theodore Y. Ts'o)
Sun Jul 14 21:53:23 1996
Date: Sun, 14 Jul 1996 21:52:48 -0400
From: "Theodore Y. Ts'o" <tytso@MIT.EDU>
To: bluebox@MIT.EDU, testers@MIT.EDU
FYI, there are some patches to fix Solaris 2.4's bad performance over
slow links. For people who are trying to use Tether to get to Solaris
machines, this will probably be much appreciated.
Perhaps we should very seriously think about applying Solaris patch
101945 to the 8.0 release? (Or have we done so already and I just
haven't heard about it?)
- Ted
------- Forwarded Message
Date: Sat, 13 Jul 1996 21:22:19 -0700
To: dave@edipost.auspost.com.au (Dave Cole)
From: Jason Merrill <jason@cygnus.com>
Cc: linux-kernel@vger.rutgers.edu, linux-net@vger.rutgers.edu
In-Reply-To: dave@edipost.auspost.com.au's message of 12 Jul 1996 00:41:55
-0400
Subject: Re: PPP slow between Linux and Solaris - why is it so?
<Pine.SUN.3.91.960711172505.22881B-100000.cygnus.linux.activists.kernel@dross>
Sender: owner-linux-net@vger.rutgers.edu
Precedence: bulk
>>>>> Dave Cole <dave@edipost.auspost.com.au> writes:
> I remember someone mentioned that Solaris was retransmitting packets too
> quickly or something causing >50% of bandwidth to be used up by duplicate
> packets. I got the impression that Solaris was to blame. Is this
> correct? In any case, I do not remember anyone ever mentioning a fix /
> trick / work around to get the throughput back up to what it should be.
-] Subject: Announcing New TCP Performance Patch
-] Date: 7 Jun 1996 23:36:21 GMT
-] From: cathe@Eng.Sun.COM (Cathe A. Ray)
-]Organization: Sun Microsystems, Inc.
-] Newsgroups: comp.unix.solaris
-]
-]
-]Sun doesn't ordinarily announce patches when they're released. But
-]we've just finished a series of TCP-related fixes and improvements, and
-]we want to make sure that the news gets out as quickly as possible to
-]the many people who can benefit from our work.
-]
-]This patch announcement will be of interest mostly to folks who use Sun
-]workstations over "slow" links, like most dial-up lines. Please note,
-]though, that you might benefit from the work we'll discuss here even if
-]you've never used one of our workstations directly. (Many companies
-]who provide Internet access use Suns as part of the communication path.
-]And the patches are for Suns running Solaris 2.4 and up.)
-]
-]Also note: This message is coming to you directly from the engineers
-]who did the work. We wanted to get the information out to you right
-]away, but we really aren't trying to replace all the other Sun sources
-]of information you might have access to. Please, don't send us lots of
-]detailed questions--we're not volunteering to answer them (or even
-]respond to many of the followups here). We just really wanted to make
-]sure this message got out. Thanks.
-]
-]Cathe A. Ray
-]Manager, Internet Engineering
-]
-]
-] TCP Performance Improvements For Slow Network Links
-] ===================================================
-]
-]Our Sun team is responsible for basic network communications software.
-]We've been putting in a lot of work lately on improving the performance
-]of TCP over slow network links. Now we're finished; testing is
-]complete; and the patches (for Solaris 2.4 and later) will be available
-]shortly.
-]
-]We undertook the work in response to feedback from customers serving
-]WWW users over asynchronous PPP links. Users of LANs and WANs built on
-]10base-T and faster media never saw the problem behavior, which
-]actually affected FTP and other TCP-based applications as well.
-]
-]With the new patches in place, slow links will operate with roughly the
-]same efficiency as fast links. Without the patches, efficiency of very
-]slow links could, under Solaris 2.5, sink to as low as 5 per cent of the
-]theoretical maximum.
-]
-]In the following sections we will describe in detail what was wrong and
-]how we fixed it. If you don't need to know all that, just check the
-]table below for the patch numbers. They'll be available soon from our
-]usual patch sources. We're confident that customers who have seen the
-]problem will now observe a remarkable improvement. Others will see no
-]change.
-]
-] SPARC:
-]
-] module
-] 2.4 2.5 2.5.1 affected
-] |-----------|-----------|-----------|-----------------|
-] | 101945-xx | 103169-05 | 103582-01 | /kernel/drv/ip |
-] | 101945-xx | 103447-03 | 103630-01 | /kernel/drv/tcp |
-] |-----------|-----------|-----------|-----------------|
-]
-] X86:
-] module
-] 2.4 2.5 2.5.1 affected
-] |-----------|-----------|-----------|-----------------|
-] | 101946-xx | 103170-05 | 103581-01 | /kernel/drv/ip |
-] | 101946-xx | 103448-03 | 103631-01 | /kernel/drv/tcp |
-] |-----------|-----------|-----------|-----------------|
-]
-] PowerPC:
-] module
-] 2.5.1 affected
-] |-----------|-----------------|
-] | 103583-01 | /kernel/drv/ip |
-] | 103632-01 | /kernel/drv/tcp |
-] |-----------|-----------------|
-]
-] Note: Where a revision number has been indicated it you should
-]ask
-] for the patch of at least that revision. In the case of
-]the 2.4
-] patch revision number it was not available at the time of
-]this
-] posting. Always try to get "the latest version" of any
-]patch
-] you go after.
-]
-]
-]HISTORY
-]
-]Strangely, the decline in throughput was the result of several
-]improvements we made over the years to the TCP retransmission
-]algorithms and parameters. Every change improved performance for
-]systems with fast links. The cumulative effect for slow links was just
-]the reverse; but almost all our systems--and our customers'--were
-]hooked up to fast links, and the drawbacks went largely unnoticed. That
-]was the state of affairs at the time 2.4 was released.
-]
-]By the time 2.5 came out, async hookups to the Web had exploded. We had
-]implemented another relatively minor TCP bug fix. Customers with fast
-]links were better off. The efficiency of slow links declined. We
-]quickly learned we had a problem.
-]
-]We tracked down the inconsistencies and rewrote the code. We've
-]redesigned the algorithm for good behavior across all supported
-]configurations. We've added slow links and a wide mix of simulated
-]platforms to our test beds, and tested the fixes in both high-speed and
-]slow-speed networks. The problem is resolved.
-]
-]Excellence is a moving target.
-]
-]
-]TECHNICAL DETAILS
-]
-]Here are some technical details. As you'll see, we've made it a pretty
-]frank discussion. (Please be aware, though, that we do not intend to
-]spend much time debating our decisions here.)
-]
-]The throughput troubles on slow lines result from an excessive rate of
-]retransmissions. The rate, in turn, is caused by a mis-tuning adaptive
-]algorithm.
-]
-]TCP packets are retransmitted if no response is received before a
-]timeout period has expired. Our routines implement a variant of the
-]familiar Karn and Jacobson adaptive algorithms, which attempts to
-]predict an efficient timeout value based on the time it took previous
-]packets to complete a roundtrip. Elapsed values are combined into a
-]smoothed average roundtrip time ("RTT") and variance.
-]
-]The key elements in this calculation are the initial RTT value and the
-]subsequent RTT's factored in. The changes we have made involve both of
-]these key areas.
-]
-]
-]INITIAL RTT VALUES
-]
-]As an unintended result of several cumulative changes, the kernel
-]parameter "tcp_rexmit_interval_initial" was actually not being used. In
-]fact, all Internet Routing Entry (IRE) RTT values were being
-]initialized to 512 milliseconds. TCP was using that as an initial
-]setting.
-]
-]For connections which flow through a route with a roundtrip time less
-]than that (such as a LAN or WAN built on 10base-T) all was well. When
-]the connection closed, the actual IRE RTT value was updated and the
-]predictive timeout value successfully adjusted.
-]
-]For connections with an RTT greater then 512 ms, however, the timeout
-]would necessarily trip, and retransmissions occur. If the actual time
-]differed sufficiently from the original estimated value, TCP was never
-]able to send a segment without one or more retransmissions. A realistic
-]RTT for the route could never be established. This scenario is the
-]beginning of the explanation of what has been happening on several-hop
-]Internet or asynchronous PPP links.
-]
-]Our solution is to initialize all IRE RTT's to zero instead of 512 ms.
-]Any new connection for a route will now, when lookup discloses the zero
-]value, get the value of the "tcp_rexmit_interval_initial" parameter
-]instead. (And it's been increased to 3 seconds.) So in most cases the
-]adaptive algorithm will now be able to adjust timeout values
-]effectively.
-]
-]
-]RTO (RETRANSMIT TIMEOUT) ALGORITHM INTERACTION
-]
-]Another factor contributing to packet congestion and retransmission was
-]a change to the RTO algorithm, introduced in a 2.4 Kernel Patch. The
-]intent was to make the behavior more "conservative"--that is, lower the
-]risk of poor timeout values. The effect on low-speed links was
-]unexpectedly contrary.
-]
-]A key (and unintended) effect of the code change was that RTT data from
-]retransmitted packets was discarded. This behavior, together with the
-]poor initial RTT values described earlier, meant that the adaptive
-]algorithm was deprived of the information needed to adjust the RTO.
-]
-]Our solution is to keep the RTO RTT update still conservative, but now
-]update the RTO after no more than one receive window's worth of valid
-]RTT's. Further, when an invalid RTT is seen--an ACK of a retransmitted
-]segment, for example--any valid RTT information is fed into the RTO
-]algorithm.
-]
-]
-]ZERO WINDOW PROBE BUG FIX
-]
-]The problems described so far affect Solaris 2.4 and 2.5 equally. What
-]changed with 2.5?
-]
-]One important fix we included in 2.5 was for the "zero window probe"
-]bug, a well-publicized problem affecting just about all versions of
-]UNIX. As part of that rewrite, we removed a nondescript piece of logic
-]that implemented a simple "backoff" scheme. The excised code caused the
-]RTO to be lengthened by one-eighth as a result of certain failures. It
-]seemed not to be needed; but it had concealed the presence of the other
-]bugs by providing a means for the RTO to reach a successful value. When
-]this code was removed the other underlying problems were exposed.
-]
-]
-]IRE RTT LOGIC
-]
-]This last part of the problem concerns the interaction between TCP and
-]the Solaris-specific Internet Routing Entries. The IRE RTT logic caches
-]RTT values to be re-used when a new connection is made over a familiar
-]link.
-]
-]This is a fine approach. The implementation, however, had a flaw: the
-]IRE RTT was updated regardless of the RTT value supplied by TCP.
-]
-]As you will have guessed by now, users of high-speed links saw no
-]effect. But in highly variable RTT routes, when a connection dominated
-]by small segments was closed, a problem could result. An RTT too short
-]for large segments was used to update the IRE RTT, and a subsequent
-]connection dominated by large segments (like FTP) experienced an
-]excessive retransmission rate. It was a different path to a familiar
-]dilemma: too small a timeout value.
-]
-]Naturally the most highly variable RTT's tend to be seen on async PPP
-]links, where the RTT of the route is compounded from (1) wire latency,
-](2) low bandwidth, and (3) congestion/queuing delays as more than one
-]segment is transmited by TCP.
-]
-]Our solution is to add an new ndd variable "tcp_rtt_updates". It allows
-]tuning or disabling of IRE RTT updates. A value of zero disables IRE
-]RTT updates. A value greater than zero specifies how many RTT updates
-]to the RTO are required--that is, how many chances the algorithm has
-]had to adapt the timeout--before a closing connection will be allowed
-]to update the RTT in the IRE.
-]
-]
-]CONCLUSION
-]
-]We've fished out, fixed, and explained some subtle flaws in our
-]adaptive retransmission algorithm. We take the responsibility for
-]introducing them--and the credit, too, for practically every piece was,
-]by itself, a successful response to our customers' needs. Better and
-]exhaustive testing would have shown up the flaws earlier, privately,
-]harmlessly. That's always our goal, and our customers have a right to
-]expect the best. Yes.
-]
-]There's always tomorrow. In the meantime: we killed this one, folks.
-]Our sincere thanks for your attention--and your business.
-]
-]
-]--
-]=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-]Joseph F. Backo E-Mail: jfb@jfbnet.net
-]=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
------- End Forwarded Message