[158098] in North American Network Operators' Group
RE: [outages] NTP Issues Today
daemon@ATHENA.MIT.EDU (R. Benjamin Kessler)
Tue Nov 20 16:07:43 2012
From: "R. Benjamin Kessler" <Ben.Kessler@zenetra.com>
To: Jeremy Chadwick <jdc@koitsu.org>, Scott Voll <svoll.voip@gmail.com>
Date: Tue, 20 Nov 2012 21:07:22 +0000
In-Reply-To: <20121120153813.GA90675@icarus.home.lan>
Cc: outages <outages@outages.org>, "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
Logs from a Juniper router in a customer network - we had hundreds of these=
affected. They all synchronize to internal hosts (172.20.167.251 and .252=
) which are configured to get time from NIST and USNO =20
CORP-NTP-01#sh ntp as
address ref clock st when poll reach delay offset =
disp
*~192.5.41.41 .IRIG. 1 354 512 377 34.2 0.36 =
1.4
+~132.163.4.101 .ACTS. 1 336 512 377 35.0 -2.54 =
18.7
~127.127.7.1 127.127.7.1 10 59 64 377 0.0 0.00 =
0.0
* master (synced), # master (unsynced), + selected, - candidate, ~ configu=
red
CORP-NTP-02#sh ntp as
address ref clock st when poll reach delay offset =
disp
*~192.5.41.41 .IRIG. 1 65 512 377 36.5 0.91 =
0.6
+~132.163.4.101 .ACTS. 1 95 512 377 34.3 -1.31 =
22.8
~127.127.7.1 127.127.7.1 10 44 64 377 0.0 0.00 =
0.0
* master (synced), # master (unsynced), + selected, - candidate, ~ configu=
red
Here are the logs from one of the Junipers:
Nov 19 14:24:48 XXXX xntpd[912]: kernel time sync enabled 2001
Nov 19 15:50:11 XXXX xntpd[912]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 16:41:23 XXXX xntpd[912]: no servers reachable
Nov 19 16:44:24 XXXX xntpd[912]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 19 16:44:24 XXXX xntpd[912]: time correction of -378691200 seconds exc=
eeds sanity limit (1000); set clock manually to the correct UTC time.
Nov 19 16:44:24 XXXX init: ntp (PID 912) exited with status=3D255
Nov 19 16:44:24 XXXX init: ntp (PID 70200) started
Nov 19 16:44:24 XXXX xntpd[70200]: ntpd 4.2.0-a Sat Apr 10 00:32:46 UTC 20=
10 (1)
Nov 19 16:44:24 XXXX xntpd[70200]: mlockall(): Resource temporarily unavai=
lable
Nov 19 16:44:24 XXXX xntpd[70200]: precision =3D 0.582 usec
Nov 19 16:44:24 XXXX xntpd[70200]: Listening on interface ggsn_vpn, 128.0.=
0.1#123
Nov 19 16:44:24 XXXX xntpd[70200]: kernel time sync status 2040
Nov 19 16:44:24 XXXX xntpd[70200]: frequency initialized -64.931 PPM from =
/var/db/ntp.drift
Nov 19 16:44:24 XXXX xntpd[70200]: Configuring iburst flag for server
Nov 19 16:44:24 XXXX xntpd[70200]: Configuring iburst flag for server
Nov 19 16:44:33 XXXX xntpd[70200]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 19 16:44:32 XXXX xntpd[70200]: time reset -378691200.411331 s
Nov 19 16:44:32 XXXX xntpd[70200]: kernel time sync disabled 2041
Nov 19 16:45:44 XXXX xntpd[70200]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 19 16:45:51 XXXX xntpd[70200]: kernel time sync enabled 2001
Nov 19 16:45:56 XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 16:53:25 XXXX xntpd[70200]: no servers reachable
Nov 19 17:03:09 XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 17:13:00 XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 17:20:27 XXXX xntpd[70200]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 17:20:27 XXXX xntpd[70200]: time correction of 378691200 seconds ex=
ceeds sanity limit (1000); set clock manually to the correct UTC time.
Nov 19 17:20:27 XXXX init: ntp (PID 70200) exited with status=3D255
Nov 19 17:20:27 XXXX init: ntp (PID 70766) started
Nov 19 17:20:27 XXXX xntpd[70766]: ntpd 4.2.0-a Sat Apr 10 00:32:46 UTC 20=
10 (1)
Nov 19 17:20:27 XXXX xntpd[70766]: mlockall(): Resource temporarily unavai=
lable
Nov 19 17:20:27 XXXX xntpd[70766]: precision =3D 0.570 usec
Nov 19 17:20:27 XXXX xntpd[70766]: Listening on interface ggsn_vpn, 128.0.=
0.1#123
Nov 19 17:20:27 XXXX xntpd[70766]: kernel time sync status 2040
Nov 19 17:20:27 XXXX xntpd[70766]: frequency initialized -64.931 PPM from =
/var/db/ntp.drift
Nov 19 17:20:27 XXXX xntpd[70766]: Configuring iburst flag for server
Nov 19 17:20:27 XXXX xntpd[70766]: Configuring iburst flag for server
Nov 19 17:20:35 XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 17:20:36 XXXX xntpd[70766]: time reset +378691200.387434 s
Nov 19 17:20:36 XXXX xntpd[70766]: kernel time sync disabled 6041
Nov 19 17:21:48 XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 17:21:48 XXXX xntpd[70766]: kernel time sync disabled 2041
Nov 19 17:21:52 XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 00:02:29 XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 20 01:44:56 XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 02:19:03 XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 02:53:12 XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 03:44:26 XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 05:26:58 XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 05:44:02 XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 07:43:35 XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 08:00:39 XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 08:34:48 XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 08:51:54 XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 10:34:22 XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 20 11:25:16 XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 20 12:33:56 XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 20 14:16:05 XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 14:33:10 XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 15:07:19 XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=
=3D2
-----Original Message-----
From: outages-bounces@outages.org [mailto:outages-bounces@outages.org] On B=
ehalf Of Jeremy Chadwick
Sent: Tuesday, November 20, 2012 10:38 AM
To: Scott Voll
Cc: Sid Rao; outages; nanog@nanog.org
Subject: Re: [outages] NTP Issues Today
I'm still waiting for someone who was affected by this to provide coherent =
logs from ntpd showing exactly when the time change happened.
Getting these, at least on an *IX system, is far from difficult folks.
Please don't omit anything from the logs either; for example if you know
*exactly* what NTP servers were in use (not "ones you had configured"
but which one was primarily chosen by ntpd ('*' mark) and which were second=
ary comparisons/fallbacks ('+' mark)), that would also be greatly helpful. =
This would be output from "ntpq -c peers" when run on your NTP server *at =
or around the time* the incident happened and recovered.
What's been provided so far is that "something happened", with reports of c=
locks going back to year 2000, and other reports of clocks going back to (p=
resumably) epoch time; those reporting it were using either usno.navy.mil, =
NIST, or Microsoft NTP servers. usno.navy.mil uses dedicated IRIG/AFNOR TC=
Rs boxes, while NIST uses GPS. No idea what Microsoft uses.
I asked on a public *IX forum if anyone saw anything NTP-wise that was out =
of the ordinary and not a single admin saw anything. I also saw nothing an=
omalous on either of my FreeBSD machines (9.1-PRERELEASE, running base syst=
em ntpd 4.2.4p8), but I sync with very specific stratum
1 and stratum 2 servers across the United States.
As Mark Andrews from the ISC stated below (read slowly/carefully), ntpd wil=
l not allow large clock jumps -- the largest it'll allow out of the box is =
1000s (and on some systems like Solaris ntpd, 500s) -- unless you're runnin=
g with the -g flag (and shame on if you're you doing that).
So I'm very surprised by this problem altogether. Can't deny what happened=
did, but figuring out *why* is important.
Also, for Mike Lyon -- I looked at NIST's GPS graphs. Did you notice they =
have no data for 11/18, 11/19, or 11/20? I find that unnerving, do you not=
?
--=20
| Jeremy Chadwick jdc@koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Nov 20, 2012 at 07:18:45AM -0800, Scott Voll wrote:
> Same thing happened to us yesterday. ended up having to reboot=20
> everything after we got time fixed. Major outage.
>=20
> Scott
>=20
>=20
> On Mon, Nov 19, 2012 at 7:58 PM, Sid Rao <srao@ctigroup.com> wrote:
>=20
> > We had multiple servers synchronized with Windows/MS time change=20
> > their clock to the year 2000 today. It broke many things, including=20
> > AD authentication.
> >
> > These servers had been properly synchronized for years.
> >
> > They were synchronized with Microsoft and NIST NTP servers.
> >
> > This may not be isolated.
> >
> > Sid Rao | CTI Group | +1 (317) 262-4677
> >
> > On Nov 19, 2012, at 10:29 PM, "George Herbert"=20
> > <george.herbert@gmail.com>
> > wrote:
> >
> > > crossreplying to outages list.
> > >
> > > Is anyone ELSE seeing GPS issues? This could well have been an=20
> > > unrelated issue on that particular PBX.
> > >
> > > If this was real, then the mother of all infrastructure attacks=20
> > > might be underway...
> > >
> > > One glitch on tick and tock and one malfunctioning PBX is not=20
> > > sufficient evidence of pattern - much less hostile activity - to=20
> > > induce panic, but it would perhaps be a wise time to check=20
> > > time-related logs?
> > >
> > >
> > > -george
> > >
> > > On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith=20
> > > <kwallace@pcconnection.com> wrote:
> > >> Just got paged with a pbx alarm that had 1970 as the year. By the=20
> > >> time
> > I logged in , it was showing 2012. Using GPS for time and date.
> > >>
> > >> -----Original Message-----
> > >> From: Mark Andrews [mailto:marka@isc.org]
> > >> Sent: Monday, November 19, 2012 8:42 PM
> > >> To: Van Wolfe
> > >> Cc: nanog@nanog.org
> > >> Subject: Re: NTP Issues Today
> > >>
> > >>
> > >> In message <
> > CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com>
> > >> , Van Wolfe writes:
> > >>> Hello,
> > >>>
> > >>> Did anyone else experience issues with NTP today? We had our=20
> > >>> server times update to the year 2000 at around 3:30 MT, then=20
> > >>> revert back to
> > 2012.
> > >>>
> > >>> Thanks,
> > >>> Van
> > >>
> > >> NTP should be immune from this sort of behaviour unless you did a
> > ntpdate at the wrong moment. The clocks should have been marked as ins=
ane.
> > >>
> > >> Mark
> > >> --
> > >> Mark Andrews, ISC
> > >> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> > >> PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
> > >>
> > >>
> > >
> > >
> > >
> > > --
> > > -george william herbert
> > > george.herbert@gmail.com
> > >
> > >
> >
> >
> > _______________________________________________
> > Outages mailing list
> > Outages@outages.org
> > https://puck.nether.net/mailman/listinfo/outages
> >
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages