[2165] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Re: IP: optimize as a router not host

daemon@ATHENA.MIT.EDU (Jon 'tex' Boone)
Wed Mar 20 05:50:05 1996

To: Alan Cox <alan@cymru.net>
cc: smurf@smurf.noris.de (Matthias Urlichs), linux-kernel@vger.rutgers.edu,
        linux-net@vger.rutgers.edu
In-reply-to: Your message of "Tue, 19 Mar 1996 09:58:39 GMT."
             <199603190958.JAA26397@snowcrash.cymru.net> 
Date: 	Wed, 20 Mar 1996 05:42:56 -0500
From: "Jon 'tex' Boone" <tex@isc.upenn.edu>

-----BEGIN PGP SIGNED MESSAGE-----

> >   Rogier Wolff <r.e.wolff@et.tudelft.nl> writes:
> > > 
> > > Suppose a sender sends packets 1,2,3,4 and 5. At the receiving
> > > end you get 1,2,4,5.  What you do is you ack that you got packet
> > > 1 and two.  When you get 4 and 5 you ack that you got 2 again
> > > and again. When a sender gets these it is supposed to conclude
> > > that packet 3 got lost and try re-sending that. 
> > 
> > There is a RFC which implements selective reject -- i.e., in the
> > ack, you add some TCP options which tell the sender that you do
> > have packets 4 and 5 (or rather, the sequence numbers for which 4
> > and 5 contained data). 
> 
> The selective ack RFC never went anywhere because you can use a
> scheme known as fast retransmit instead to avoid pipeline stalls on
> a loss of 1 frame/window or less. What happens is the sender sends 1
> 2 3 4 5 6 the receive gets 1 2 4 5 6. The acks from the receiver
> thus go 1 2 2 2 2 .. seeing 3 acks for an old frame in a row the
> sender now sends frame 3 again immediately, then will get an ack of
> 7 and continue. 

  This is not entirely accurate.  Alan is correct in saying that the
  fast retransmit algorithm is used to recover from a single lost
  frame per window size.  However, quoting from RFC 1323 [Van
  Jacobsen, the author of the algorithm is a co-author of RFC 1323]:

      There are three fundamental performance problems with the
    current TCP over LFN paths:
   
    ...

      (1)  Window Size Limit

           ...

      (2)  Recovery from Losses

           Packet losses in an LFN can have a catastrophic effect on 
           throughput.  Until recently, properly-operating TCP
           implementations would cause the data pipeline to drain with
           every packet loss, and require a slow-start action to
           recover.  Recently, the Fast Retransmit and Fast Recovery
           algorithms [Jacobson90c] have been introduced.  Their
           combined effect is to recover from one packet loss per
           window, without draining the pipeline.  However, more than
           one packet loss per window typically results in a
           retransmission timeout and the resulting pipeline drain and
           slow start.

           Expanding the window size to match the capacity of an LFN
           results in a corresponding increase of the probability of
           more than one packet per window being dropped.  This could
           have a devastating effect upon the throughput of TCP over an
           LFN.  In addition, if a congestion control mechanism based
           upon some form of random dropping were introduced into
           gateways, randomly spaced packet drops would become common,
           possible increasing the probability of dropping more than one
           packet per window.

           To generalize the Fast Retransmit/Fast Recovery mechanism to
           handle multiple packets dropped per window, selective
           acknowledgments are required.  Unlike the normal cumulative
           acknowledgments of TCP, selective acknowledgments give the
           sender a complete picture of which segments are queued at the
           receiver and which have not yet arrived.  Some evidence in
           favor of selective acknowledgments has been published
           [NBS85], and selective acknowledgments have been included in
           a number of experimental Internet protocols -- VMTP
           [Cheriton88], NETBLT [Clark87], and RDP [Velten84], and
           proposed for OSI TP4 [NBS85].  However, in the non-LFN
           regime, selective acknowledgments reduce the number of
           packets retransmitted but do not otherwise improve
           performance, making their complexity of questionable value.
           However, selective acknowledgments are expected to become
           much more important in the LFN regime.

           RFC-1072 defined a new TCP "SACK" option to send a selective
           acknowledgment.  However, there are important technical
           issues to be worked out concerning both the format and
           semantics of the SACK option.  Therefore, SACK has been
           omitted from this package of extensions.  It is hoped that
           SACK can "catch up" during the standardization process.

  
  So, as you can see, selective acknowledgment has been disconnected
  from the other improvements of RFC 1323 and is now being worked on
  by the TCP for Large Windows working group of the IETF (tcplw) and
  there is a current draft for sack - draft-ietf-tcplw-sack-00.txt.

  Also, please note the following from the draft-ietf-tcplw-sack-00.txt:

    4.  GENERATING SACK OPTIONS:  DATA RECEIVER BEHAVIOR

      If the data receiver has received a SACK-Permitted option on the
      SYN for this connection, the data receiver MAY elect to generate
      SACK options as described below.  If the data receiver generates
      SACK options under any circumstance, it SHOULD generate them
      under all permitted circumstances.  If the data receiver has not
      received a SACK-Permitted option for a given connection, it MUST
      NOT send SACK options on that connection.

      If sent at all, SACK options SHOULD be included in all ACKs which
      do not ACK the highest sequence number in the data receiver's
      queue.  In this situation the network has lost or mis-ordered
      data, such that the receiver holds non-contiguous data in its
      queue.  RFC 1122, Section 4.2.2.21, discusses the reasons for
      the receiver to send ACKs in response to additional segments
      received in this state.  The receiver SHOULD send an ACK for
      every valid segment that arrives containing new data, and each
      of these "duplicate" ACKs SHOULD bear a SACK option.
     
      If the data receiver chooses to send a SACK option, the
      following rules apply:

          * The first SACK block (i.e., the one immediately following
          the kind and length fields in the option) MUST specify the 
          contiguous block of data containing the segment which
          triggered this ACK, unless that segment advanced the
          Acknowledgment Number field in the header.  This assures
          that the ACK with the SACK option reflects the most recent
          state change at the data receiver.

          * The data receiver SHOULD include as many distinct SACK
          blocks as possible in the SACK option.  Note that the
          maximum available option space may not be sufficient to
          report all blocks present in the receiver's queue.

          * The SACK option SHOULD be filled out by repeating the most
          recently reported SACK blocks (based on first SACK blocks in
          previous SACK options) that are not subsets of a SACK block
          already included in the SACK option being constructed.  This
          assures that in normal operation every SACK block is repeated
          several times.  (At least three times for large-window TCP
          implementations [RFC1323]).

      It is very important that the SACK option always reports the
      block containing the most recently received segment, because 
      this provides the sender with the most up-to-date information
      about the state of the network and the data receiver's queue.


  So, if the draft for SACK becomes a standard, then while you are not
  required to send a SACK, if you do use SACK, it should be used in
  all circumstances (including the loss of a single packet per window
  where now you just use the fast retransmit/fast recovery) where it
  is permitted.  This seems to indicate that SACK is intended as a
  replacement for fast retransmit/fast recovery once standardized.

  <tex@isc.upenn.edu>








-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBMU/hIK3OmEV1jeMFAQFLaQQApspgy2P+ZC2QzPGImCwQ6a0fLiN1cjpp
x5pA8LhSUVmKU/imr1Mo/h8JwFA15o4nahX4ijgM89FDOwSls/zaXv0WrgrLOsHp
ByUQMd1PNMiJwiE9QJy/ZzK2flHbV7Vxaf3tMvYbAipOvO3zKmQHQpN3J++Lqtsj
gMWLX3ViIgY=
=aeYV
-----END PGP SIGNATURE-----


home help back first fref pref prev next nref lref last post