[6931] in Release_7.7_team
Re: PXE in W20
daemon@ATHENA.MIT.EDU (Joshua Oreman)
Sat Aug 28 11:04:01 2010
MIME-Version: 1.0
In-Reply-To: <27FE6356-1566-4B53-900D-CC327735FF2D@mit.edu>
From: Joshua Oreman <oremanj@MIT.EDU>
Date: Sat, 28 Aug 2010 08:03:34 -0700
Message-ID: <AANLkTi=h8bazjW0MJOYDuQxK0WrkE1TCd550kP4jHsPN@mail.gmail.com>
To: Jonathan Reed <jdreed@mit.edu>
Cc: release-team@mit.edu
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
On Sat, Aug 28, 2010 at 7:51 AM, Jonathan Reed <jdreed@mit.edu> wrote:
>> There was some suspicion on zephyr that Network chose the worst
>> possible time to fix their DHCP servers... (On the plus side, this
>> means you shouldn't have to play the "how many times will I have to
>> boot this before PXE works?" game in the future.)
>
> Can you clarify the first sentence? The timestamp on the mail you forwarded is 11:37am, long after DHCP would have been relevant for the Lucid upgrade. Or are you suggesting that the fixes were put in place earlier than that? (They would have had to have been between 2am and 8am to have any effect on the upgrade)
I think it's likely that any changes to the DHCP servers would have
happened overnight, and that user interaction wouldn't happen until
the next working day.
> Is there any additional information on what was done and how we know for sure that this fixes the PXE problems we were seeing? And was this related to the RT ticket you (Josh) had opened with NIST?
Yes, this resolves that ticket. I've tested DHCP from W20 (the SIPB
office specifically) and found that now PXE DHCPDISCOVERs will only
receive one response, from installer.mit.edu (the PXE-enabled DHCP
server); that should satisfy even Broadcom's finicky PXE firmware.
There's no longer the chance that the first DHCPOFFER will come from a
server without PXE enabled, which in the Broadcom case (AFAICT) would
cause that first bad server to be retried several times before giving
up on PXE boot. This will also make PXE boots with Intel's firmware
faster (no need to possibly timeout on the bad servers first) but
Intel's firmware already tried all DHCPOFFERs in sequence so it won't
increase functionality there.
(In the above, "bad" = "sends PXE options without listening on port
4011 for the follow-up request".)
Josh