[17707] in Athena Bugs

home help back first fref pref prev next nref lref last post

sun4 8.3.27: /usr/athena/bin/ftp

daemon@ATHENA.MIT.EDU (John Hawkinson)
Fri Mar 24 04:11:00 2000

Message-Id: <200003240910.EAA15452@mary-kay-commandos.mit.edu>
To: bugs@MIT.EDU
Cc: Angie Kelic <sly@MIT.EDU>
Date: Fri, 24 Mar 2000 04:10:53 -0500
From: John Hawkinson <jhawk@MIT.EDU>

System name:		mary-kay-commandos.mit.edu
Type and version:	SPARC/5 8.3.27 (with mkserv)
Display type:		cgsix

Shell:			/bin/sh (/afs/sipb/project/tcsh/betatcsh?)
Window manager:		unknown

What were you trying to do?
	Transfer a large file from one machine to another via kerberized
	ftp.

	The problem appears to be both tcp window-limitted and kftp-related.
	The window-limitting only shows up because of the kftp problem.

What's wrong:

	Performance is *abominable*. Angie noticed this problem while
	kftp-ing between two redhat athena 5.2 machines on the same
	unloaded network.  Investigation showed that there were 10-second
	pauses in the middle of the transfer where the window had not
	closed, and the received had ACKd the data, but the sender did not
	send the next packet for about 10 seconds. The ACK was clearly not
	lost as it was not retransmitted and data was shoved out again
	after the timeout period expired. The last ACKed data before the
	hang was less than a full MTU and had PSH set.

	I was able to reproduce a similar problem (though less dramatic)
	under 8.3.27 ftp-ing a large file (/usr/athena/bin/emacs) from
	cvp to mkc. It's hard to see the problem in ASCII, but here's
	a time-sequence plot of the data packets (generated by
	/afs/sipb/project/tcpdump/src/tcpdump/stime.awk):


seq
    10000 ++-----+-------+------+------+-------+------+------+-------+-----++
          +      +       +      +      +       +      +      +       +.     +
          |                                                       data. .   |
          |                                                   . .. ..       |
     8000 ++                                                 ..            ++
          |                                            .......              |
          |                                           ..                    |
     6000 ++                                    .... ..                    ++
          |                                   ...                           |
          |                              . .. .                             |
          |                            ...                                  |
     4000 ++                     .. .. .                                   ++
          |                .......                                          |
          |               ..                                                |
     2000 ++        .......                                                ++
          |        ..                                                       |
          |  .......                                                        |
          + ..   +       +      +      +       +      +      +       +      +
        0 ...----+-------+------+------+-------+------+------+-------+.----++
          0      5      10     15     20      25     30     35      40     45
                                         time

There are significant delays between each burst. If we narrow to, say, 5
to 11 seconds:


seq
     2600 ++---------+----------+----------+----------+----------+---------++
          +          +          +          +          +          +          .
     2400 ++                                                      data  .  ..
          |                                                               ..|
     2200 ++                                                            ...++
          |                                                            ..   |
          |                       ..         .. .  .     .          .  .    |
     2000 ++                     ..                                        ++
          |                    ..                                           |
     1800 ++                 ...                                           ++
          |                ...                                              |
     1600 ++              ..                                               ++
          |             ...                                                 |
     1400 ++           ..                                                  ++
          |          ...                                                    |
          |        ...                                                      |
     1200 ++      ..                                                       ++
          +      ..  +          +          +          +          +          +
     1000 .+-----.---+----------+----------+----------+----------+---------++
          5          6          7          8          9         10         11
                                         time

Clearly some sort of event is taking place from t=8 to t=10 causing the
transfer to stall. A plot of the acks shows no difference from the data
packets.

I think my case must be different from Angie's, but I'm not certain.
Here's the raw tcpdump output at the time of great lossage, massaged to
fit in 80 cols, with annotations following blocks of the trace:

03:16:07.507336 cvp > mkc: . 2107369:2108829(1460) ack 1 win 8760 (DF) 
03:16:07.508710 cvp > mkc: . 2108829:2110289(1460) ack 1 win 8760 (DF) 
03:16:07.510345 cvp > mkc: P 2110289:2111749(1460) ack 1 win 8760 (DF) 
03:16:07.511565 cvp > mkc: . 2111749:2113209(1460) ack 1 win 8760 (DF) 
03:16:07.512841 cvp > mkc: . 2113209:2114669(1460) ack 1 win 8760 (DF) 
03:16:07.514081 cvp > mkc: P 2114669:2116129(1460) ack 1 win 8760 (DF) 

cvp sent a nice long stream of data to mkc.

03:16:07.518631 mkc > cvp: . ack 2110289 win 5840 (DF) 
03:16:07.518721 mkc > cvp: . ack 2113209 win 2920 (DF) 
03:16:07.518754 mkc > cvp: . ack 2116129 win 0 (DF) 

mkc acks the all 6 packets with an ack-every-other packet policy,
sucessively decreasing the advertised window, finally closing it
to zero on the last ack. Note that the receiver's advertised window
is 8760 (and it started out that way at the beginning of the connection),
which is 6*1460 (i.e. 6 full-mtu packets).

So closure of the window here on mkc's part is an indication that it
is sitting and processing the data (i.e. the app has not yet read it),
and is not willing to accept more.

03:16:07.755721 cvp > mkc: P 2116129:2116130(1) ack 1 win 8760 (DF) 
03:16:07.755822 mkc > cvp: . ack 2116129 win 0 (DF) 
03:16:08.011798 cvp > mkc: . 2116129:2116130(1) ack 1 win 8760 (DF) 
03:16:08.015372 mkc > cvp: . ack 2116129 win 0 (DF) 
03:16:08.525788 cvp > mkc: . 2116129:2116130(1) ack 1 win 8760 (DF) 
03:16:08.525953 mkc > cvp: . ack 2116129 win 0 (DF) 
03:16:09.548042 cvp > mkc: . 2116129:2116130(1) ack 1 win 8760 (DF) 
03:16:09.548177 mkc > cvp: . ack 2116129 win 0 (DF) 

cvp acks mkc's window closure and then procedes to send periodic
window probes, all of which are answered with a window of zero.
So mkc is sitting here for 3 seconds preventing cvp from sending
any more data.

03:16:09.785564 mkc > cvp: . ack 2116129 win 8760 (DF) 

Finally, shortly after the 3rd window probe but not before
the 4th would have been sent (Solaris seems to send them with
an every-second policy), mkc re-opens it's window to 8k again.
It's interesting that mkc seems to have ignored the window
probes (which include up to 2116130) and only kept acking the
previous packet. This is probably a minor bug in the Solaris tcp,
but isn't worth reporting w/o verification that it is still in
Solaris 8, I'd say.

03:16:09.786333 cvp > mkc: . 2116129:2116130(1) ack 1 win 8760 (DF) 
03:16:09.786574 cvp > mkc: . 2116129:2116130(1) ack 1 win 8760 (DF) 
03:16:09.786647 mkc > cvp: . ack 2116130 win 8760 (DF) 

cvp retransmits 2116130 twice in rapid sucession and finally mkc acks it.
Again, this looks like a tcp bug (double retransmission), but it's
hard to know for sure, and depends on cvp's idea of the rtt to mkc.

03:16:09.789419 cvp > mkc: . 2116130:2117589(1459) ack 1 win 8760 (DF) 
03:16:09.790666 cvp > mkc: . 2117589:2119049(1460) ack 1 win 8760 (DF) 
03:16:09.790854 mkc > cvp: . ack 2119049 win 8760 (DF) 

Finally, cvp starts sending packets again.

This goes on for a while and then breaks.



What should have happened:
a)    mkc should not have closed the window for three seconds.
What was it/krb-ftp doing? I don't know, but it should be checked out.

b)    The default window size should have been way larger, like 32k at
least. When the connection was opened, the window size was only 8k:

03:15:59.283392 CONTENTS-VNDER-PRESSVRE.ftp-data > mary-kay-commandos.34483: S
	4120975642:4120975642(0) win 8760 <mss 1460> (DF) [tos 0x8]
03:15:59.283503 mary-kay-commandos.34483 > CONTENTS-VNDER-PRESSVRE.ftp-data: S
	1636576099:1636576099(0) ack 4120975643 win 8760 <mss 1460> (DF)

and it never got bigger. I'm pretty shocked that Solaris is offering such a small
receive window. Looking at "ndd /dev/tcp", it doesn't seem to

Please describe any relevant documentation references:
	[Please replace this line with your information.]

For comparison's sake, here's a sample plot of an ssh
connection copying the same file, and then kftp with it.
The difference is much more dramatic with real graphics, but
basically the ssh tcp connection is a nice straight diagonal
line with no lossage, window-limitting or otherwise.
And the kftp has these periodic stalls.

seq
    10000 ++-----+-------+------+------+-------+------+------+-------+-----++
          +      +       +      +      +       +      +   .  +       +      +
          |                                            ....   ssh data  .   |
          |                                         ....                    |
     8000 ++                                     ....                      ++
          |                                   ....                          |
          |                                 ...                             |
     6000 ++                            .....                              ++
          |                           ...                                   |
          |                        ....                                     |
          |                     ....                                        |
     4000 ++                  ...                                          ++
          |                ....                                             |
          |             ....                                                |
     2000 ++         ....                                                  ++
          |        ...                                                      |
          |     ....                                                        |
          +  ....+       +      +      +       +      +      +       +      +
        0 ....---+-------+------+------+-------+------+---.--+-------+-----++
          0      5      10     15     20      25     30     35      40     45
                                         time

seq
    10000 ++-----+-------+------+------+-------+------+------+-------+-----++
          +      +       +      +      +       +      +      +       +.     +
          |                                                  kftp data. .   |
          |                                                   . .. ..       |
     8000 ++                                                 ..            ++
          |                                            .......              |
          |                                           ..                    |
     6000 ++                                    .... ..                    ++
          |                                   ...                           |
          |                              . .. .                             |
          |                            ...                                  |
     4000 ++                     .. .. .                                   ++
          |                .......                                          |
          |               ..                                                |
     2000 ++        .......                                                ++
          |        ..                                                       |
          |  .......                                                        |
          + ..   +       +      +      +       +      +      +       +      +
        0 ...----+-------+------+------+-------+------+------+-------+.----++
          0      5      10     15     20      25     30     35      40     45
                                         time


home help back first fref pref prev next nref lref last post