[158116] in North American Network Operators' Group
Re: NTP Issues Today
daemon@ATHENA.MIT.EDU (Robert E. Seastrom)
Wed Nov 21 07:21:07 2012
To: Blake Dunlap <ikiris@gmail.com>
From: "Robert E. Seastrom" <rs@seastrom.com>
Date: Wed, 21 Nov 2012 07:20:47 -0500
In-Reply-To: <CAJvB4tk=tQna_UHcrEXCN19EaUbuizdW_LqUc+-nrPhKsuCYvQ@mail.gmail.com> (Blake
Dunlap's message of "Tue, 20 Nov 2012 19:03:02 -0600")
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
Blake Dunlap <ikiris@gmail.com> writes:
> That's what happens when you just follow vendor recommendations blindly. If
> you do follow that on vm's (which can actually be a good practice), make
> sure they pull from your own time infrastructure, and not just the world at
> large, and that those servers behave in a sane fashion with regard to time
> jumps.
Emphatically disagree on the "pull from your own infrastructure"
point. You probably don't have the budget even in a big company for
sufficient diversity of sources [*] for your NTP server and even if
you do the NTP servers will probably be run by the same
person/organization. Mills has called the latter practice out as bad
in the past.
As Leo pointed out, the key is having a large diverse set so that if a
couple of them go nuts they can be voted off the island.
If you have a requirement for super low jitter or holdover if you lose
network, you're looking at on-site devices with OCXO or Rb frequency
standards in them. That doesn't mean you shouldn't be talking to the
rest of the world too though. What if your on-site sources go nuts?
This happens periodically, say every 10 years or so, because of crappy
implementations and worst-current-practices. A re-read of
https://groups.google.com/forum/?fromgroups=#!search/mills$20ntp$20byzantine/comp.protocols.time.ntp/TryjqtAd1XM/R0zzzE13Tl8J
may prove instructive.
(reading list also includes http://www.amazon.com/dp/1439814635/ )
In my experience NTP beats out even DNS for "blatantly wrong configs
in the wild that nevertheless seem to work well enough that dilettante
tech folks don't notice".
I might have replied to this thread yesterday but I was blissfully
unaware of any problems:
rs@bifrost [8] % ntpq -c peers | egrep -v '(===|remote)' | wc -l
11
rs@bifrost [9] %
-r
[*] particularly due to shortsighted US federal government choices on
LORAN, GOES, WWVB time format, etc