[6932] in Release_7.7_team
Re: PXE in W20
daemon@ATHENA.MIT.EDU (Geoffrey Thomas)
Sat Aug 28 16:35:27 2010
Date: Sat, 28 Aug 2010 16:35:20 -0400 (EDT)
From: Geoffrey Thomas <geofft@MIT.EDU>
To: Joshua Oreman <oremanj@mit.edu>
cc: Jonathan Reed <jdreed@mit.edu>, release-team@mit.edu
In-Reply-To: <AANLkTi=h8bazjW0MJOYDuQxK0WrkE1TCd550kP4jHsPN@mail.gmail.com>
Message-ID: <alpine.DEB.1.10.1008281630310.16876@dr-wily.mit.edu>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="-1257051904-1111688692-1283027720=:16876"
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
---1257051904-1111688692-1283027720=:16876
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
On Sat, 28 Aug 2010, Joshua Oreman wrote:
> On Sat, Aug 28, 2010 at 7:51 AM, Jonathan Reed <jdreed@mit.edu> wrote:
>>> There was some suspicion on zephyr that Network chose the worst
>>> possible time to fix their DHCP servers... (On the plus side, this
>>> means you shouldn't have to play the "how many times will I have to
>>> boot this before PXE works?" game in the future.)
>>
>> Can you clarify the first sentence? =A0The timestamp on the mail you=20
>> forwarded is 11:37am, long after DHCP would have been relevant for the=
=20
>> Lucid upgrade. =A0Or are you suggesting that the fixes were put in place=
=20
>> earlier than that? =A0(They would have had to have been between 2am and=
=20
>> 8am to have any effect on the upgrade)
>
> I think it's likely that any changes to the DHCP servers would have
> happened overnight, and that user interaction wouldn't happen until
> the next working day.
Yeah, my suspicion was that Network was tweaking things overnight and=20
fixing them, and then replied in the morning, picking overnight as a nice=
=20
quiet time when nobody would be around so they could try things... and so=
=20
did we, for the upgrade, and this was just a bad coincidence. I don't=20
think we explicitly communiated to them that we needed DHCP extremely=20
stable on that particular night (even if we weren't using PXE).
>> Is there any additional information on what was done and how we know=20
>> for sure that this fixes the PXE problems we were seeing? =A0And was thi=
s=20
>> related to the RT ticket you (Josh) had opened with NIST?
>
> Yes, this resolves that ticket. I've tested DHCP from W20 (the SIPB
> office specifically) and found that now PXE DHCPDISCOVERs will only
> receive one response, from installer.mit.edu (the PXE-enabled DHCP
> server); that should satisfy even Broadcom's finicky PXE firmware.
> There's no longer the chance that the first DHCPOFFER will come from a
> server without PXE enabled, which in the Broadcom case (AFAICT) would
> cause that first bad server to be retried several times before giving
> up on PXE boot. This will also make PXE boots with Intel's firmware
> faster (no need to possibly timeout on the bad servers first) but
> Intel's firmware already tried all DHCPOFFERs in sequence so it won't
> increase functionality there.
>
> (In the above, "bad" =3D "sends PXE options without listening on port
> 4011 for the follow-up request".)
By the way, Josh, thanks on our behalf for following up with Network=20
enough to resolve this ticket.
--=20
Geoffrey Thomas
geofft@mit.edu
---1257051904-1111688692-1283027720=:16876--