[158098] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

RE: [outages] NTP Issues Today

daemon@ATHENA.MIT.EDU (R. Benjamin Kessler)
Tue Nov 20 16:07:43 2012

From: "R. Benjamin Kessler" <Ben.Kessler@zenetra.com>
To: Jeremy Chadwick <jdc@koitsu.org>, Scott Voll <svoll.voip@gmail.com>
Date: Tue, 20 Nov 2012 21:07:22 +0000
In-Reply-To: <20121120153813.GA90675@icarus.home.lan>
Cc: outages <outages@outages.org>, "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

Logs from a Juniper router in a customer network - we had hundreds of these=
 affected.  They all synchronize to internal hosts (172.20.167.251 and .252=
) which are configured to get time from  NIST and USNO =20

CORP-NTP-01#sh ntp as

      address         ref clock     st  when  poll reach  delay  offset    =
disp
*~192.5.41.41      .IRIG.            1   354   512  377    34.2    0.36    =
 1.4
+~132.163.4.101    .ACTS.            1   336   512  377    35.0   -2.54    =
18.7
 ~127.127.7.1      127.127.7.1      10    59    64  377     0.0    0.00    =
 0.0
 * master (synced), # master (unsynced), + selected, - candidate, ~ configu=
red

CORP-NTP-02#sh ntp as

      address         ref clock     st  when  poll reach  delay  offset    =
disp
*~192.5.41.41      .IRIG.            1    65   512  377    36.5    0.91    =
 0.6
+~132.163.4.101    .ACTS.            1    95   512  377    34.3   -1.31    =
22.8
 ~127.127.7.1      127.127.7.1      10    44    64  377     0.0    0.00    =
 0.0
 * master (synced), # master (unsynced), + selected, - candidate, ~ configu=
red

Here are the logs from one of the Junipers:

Nov 19 14:24:48  XXXX xntpd[912]: kernel time sync enabled 2001
Nov 19 15:50:11  XXXX xntpd[912]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 16:41:23  XXXX xntpd[912]: no servers reachable
Nov 19 16:44:24  XXXX xntpd[912]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 19 16:44:24  XXXX xntpd[912]: time correction of -378691200 seconds exc=
eeds sanity limit (1000); set clock manually to the correct UTC time.
Nov 19 16:44:24  XXXX init: ntp (PID 912) exited with status=3D255
Nov 19 16:44:24  XXXX init: ntp (PID 70200) started
Nov 19 16:44:24  XXXX xntpd[70200]: ntpd 4.2.0-a Sat Apr 10 00:32:46 UTC 20=
10 (1)
Nov 19 16:44:24  XXXX xntpd[70200]: mlockall(): Resource temporarily unavai=
lable
Nov 19 16:44:24  XXXX xntpd[70200]: precision =3D 0.582 usec
Nov 19 16:44:24  XXXX xntpd[70200]: Listening on interface ggsn_vpn, 128.0.=
0.1#123
Nov 19 16:44:24  XXXX xntpd[70200]: kernel time sync status 2040
Nov 19 16:44:24  XXXX xntpd[70200]: frequency initialized -64.931 PPM from =
/var/db/ntp.drift
Nov 19 16:44:24  XXXX xntpd[70200]: Configuring iburst flag for server
Nov 19 16:44:24  XXXX xntpd[70200]: Configuring iburst flag for server
Nov 19 16:44:33  XXXX xntpd[70200]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 19 16:44:32  XXXX xntpd[70200]: time reset -378691200.411331 s
Nov 19 16:44:32  XXXX xntpd[70200]: kernel time sync disabled 2041
Nov 19 16:45:44  XXXX xntpd[70200]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 19 16:45:51  XXXX xntpd[70200]: kernel time sync enabled 2001
Nov 19 16:45:56  XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 16:53:25  XXXX xntpd[70200]: no servers reachable
Nov 19 17:03:09  XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 17:13:00  XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 17:20:27  XXXX xntpd[70200]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 17:20:27  XXXX xntpd[70200]: time correction of 378691200 seconds ex=
ceeds sanity limit (1000); set clock manually to the correct UTC time.
Nov 19 17:20:27  XXXX init: ntp (PID 70200) exited with status=3D255
Nov 19 17:20:27  XXXX init: ntp (PID 70766) started
Nov 19 17:20:27  XXXX xntpd[70766]: ntpd 4.2.0-a Sat Apr 10 00:32:46 UTC 20=
10 (1)
Nov 19 17:20:27  XXXX xntpd[70766]: mlockall(): Resource temporarily unavai=
lable
Nov 19 17:20:27  XXXX xntpd[70766]: precision =3D 0.570 usec
Nov 19 17:20:27  XXXX xntpd[70766]: Listening on interface ggsn_vpn, 128.0.=
0.1#123
Nov 19 17:20:27  XXXX xntpd[70766]: kernel time sync status 2040
Nov 19 17:20:27  XXXX xntpd[70766]: frequency initialized -64.931 PPM from =
/var/db/ntp.drift
Nov 19 17:20:27  XXXX xntpd[70766]: Configuring iburst flag for server
Nov 19 17:20:27  XXXX xntpd[70766]: Configuring iburst flag for server
Nov 19 17:20:35  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 17:20:36  XXXX xntpd[70766]: time reset +378691200.387434 s
Nov 19 17:20:36  XXXX xntpd[70766]: kernel time sync disabled 6041
Nov 19 17:21:48  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 19 17:21:48  XXXX xntpd[70766]: kernel time sync disabled 2041
Nov 19 17:21:52  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 00:02:29  XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 20 01:44:56  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 02:19:03  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 02:53:12  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 03:44:26  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 05:26:58  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 05:44:02  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 07:43:35  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 08:00:39  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 08:34:48  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 08:51:54  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 10:34:22  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 20 11:25:16  XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=
=3D2
Nov 20 12:33:56  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=
=3D2
Nov 20 14:16:05  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 14:33:10  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 15:07:19  XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=
=3D2




-----Original Message-----
From: outages-bounces@outages.org [mailto:outages-bounces@outages.org] On B=
ehalf Of Jeremy Chadwick
Sent: Tuesday, November 20, 2012 10:38 AM
To: Scott Voll
Cc: Sid Rao; outages; nanog@nanog.org
Subject: Re: [outages] NTP Issues Today

I'm still waiting for someone who was affected by this to provide coherent =
logs from ntpd showing exactly when the time change happened.
Getting these, at least on an *IX system, is far from difficult folks.

Please don't omit anything from the logs either; for example if you know
*exactly* what NTP servers were in use (not "ones you had configured"
but which one was primarily chosen by ntpd ('*' mark) and which were second=
ary comparisons/fallbacks ('+' mark)), that would also be greatly helpful. =
 This would be output from "ntpq -c peers" when run on your NTP server *at =
or around the time* the incident happened and recovered.

What's been provided so far is that "something happened", with reports of c=
locks going back to year 2000, and other reports of clocks going back to (p=
resumably) epoch time; those reporting it were using either usno.navy.mil, =
NIST, or Microsoft NTP servers.  usno.navy.mil uses dedicated IRIG/AFNOR TC=
Rs boxes, while NIST uses GPS.  No idea what Microsoft uses.

I asked on a public *IX forum if anyone saw anything NTP-wise that was out =
of the ordinary and not a single admin saw anything.  I also saw nothing an=
omalous on either of my FreeBSD machines (9.1-PRERELEASE, running base syst=
em ntpd 4.2.4p8), but I sync with very specific stratum
1 and stratum 2 servers across the United States.

As Mark Andrews from the ISC stated below (read slowly/carefully), ntpd wil=
l not allow large clock jumps -- the largest it'll allow out of the box is =
1000s (and on some systems like Solaris ntpd, 500s) -- unless you're runnin=
g with the -g flag (and shame on if you're you doing that).
So I'm very surprised by this problem altogether.  Can't deny what happened=
 did, but figuring out *why* is important.

Also, for Mike Lyon -- I looked at NIST's GPS graphs.  Did you notice they =
have no data for 11/18, 11/19, or 11/20?  I find that unnerving, do you not=
?

--=20
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Tue, Nov 20, 2012 at 07:18:45AM -0800, Scott Voll wrote:
> Same thing happened to us yesterday.  ended up having to reboot=20
> everything after we got time fixed.  Major outage.
>=20
> Scott
>=20
>=20
> On Mon, Nov 19, 2012 at 7:58 PM, Sid Rao <srao@ctigroup.com> wrote:
>=20
> > We had multiple servers synchronized with Windows/MS time change=20
> > their clock to the year 2000 today.  It broke many things, including=20
> > AD authentication.
> >
> > These servers had been properly synchronized for years.
> >
> > They were synchronized with Microsoft and NIST NTP servers.
> >
> > This may not be isolated.
> >
> > Sid Rao | CTI Group | +1 (317) 262-4677
> >
> > On Nov 19, 2012, at 10:29 PM, "George Herbert"=20
> > <george.herbert@gmail.com>
> > wrote:
> >
> > > crossreplying to outages list.
> > >
> > > Is anyone ELSE seeing GPS issues?  This could well have been an=20
> > > unrelated issue on that particular PBX.
> > >
> > > If this was real, then the mother of all infrastructure attacks=20
> > > might be underway...
> > >
> > > One glitch on tick and tock and one malfunctioning PBX is not=20
> > > sufficient evidence of pattern - much less hostile activity - to=20
> > > induce panic, but it would perhaps be a wise time to check=20
> > > time-related logs?
> > >
> > >
> > > -george
> > >
> > > On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith=20
> > > <kwallace@pcconnection.com> wrote:
> > >> Just got paged with a pbx alarm that had 1970 as the year. By the=20
> > >> time
> > I logged in , it was showing 2012.  Using GPS for time and date.
> > >>
> > >> -----Original Message-----
> > >> From: Mark Andrews [mailto:marka@isc.org]
> > >> Sent: Monday, November 19, 2012 8:42 PM
> > >> To: Van Wolfe
> > >> Cc: nanog@nanog.org
> > >> Subject: Re: NTP Issues Today
> > >>
> > >>
> > >> In message <
> > CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com>
> > >> , Van Wolfe writes:
> > >>> Hello,
> > >>>
> > >>> Did anyone else experience issues with NTP today?  We had our=20
> > >>> server times update to the year 2000 at around 3:30 MT, then=20
> > >>> revert back to
> > 2012.
> > >>>
> > >>> Thanks,
> > >>> Van
> > >>
> > >> NTP should be immune from this sort of behaviour unless you did a
> > ntpdate at the wrong moment.  The clocks should have been marked as ins=
ane.
> > >>
> > >> Mark
> > >> --
> > >> Mark Andrews, ISC
> > >> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> > >> PHONE: +61 2 9871 4742                 INTERNET: marka@isc.org
> > >>
> > >>
> > >
> > >
> > >
> > > --
> > > -george william herbert
> > > george.herbert@gmail.com
> > >
> > >
> >
> >
> > _______________________________________________
> > Outages mailing list
> > Outages@outages.org
> > https://puck.nether.net/mailman/listinfo/outages
> >

> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages

_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages


home help back first fref pref prev next nref lref last post