Frequent time reset messages

I'm running a moderate number (around 50) dual-opterons that are
diskless booting a Linux 2.6.12 smp kernel and trying to synch with a
Symmetricon XLI-GPS stratum-1 NTP server on an isolated network.

The problem I have is that when I run "ntpq -c peers" on a number of
these machines to check the status of the ntp synchronization, I see
offsets ranging over almost 1000 msecs. If I grep through the /var/log/
messages file, I see that there are often messages around every 20
minutes like this:

Dec 1 20:30:28 (none) ntpd[27203]: time reset 0.613771 s
Dec 1 20:30:28 (none) ntpd[27203]: synchronisation lost
Dec 1 20:50:45 (none) ntpd[27203]: time reset 0.931388 s
Dec 1 20:50:45 (none) ntpd[27203]: synchronisation lost
Dec 1 21:19:23 (none) ntpd[27203]: time reset 0.451491 s
Dec 1 21:19:23 (none) ntpd[27203]: synchronisation lost
Dec 1 21:36:24 (none) ntpd[27203]: time reset 0.391510 s
Dec 1 21:36:24 (none) ntpd[27203]: synchronisation lost

This seems like large (and frequent) steps to be occuring. I have a
fairly simple ntp.conf file:
restrict default ignore
restrict mask nomodify notrap noquery

server iburst
server iburst # local clock
fudge stratum 5 # default was 10

driftfile /var/lib/ntp/drift

These machines each have a Gigabit network connection to a high-end
network switch. I believe the NTP Server probably has only a 100MBit
link, and he has all the traffic, but I don't think that is the

Probably the main issue is the CPU and I/O loading on these opteron
machines. They are each handling streaming data from a firewire card
(IEEE-1394a) and the CPUs stay fairly busy handling that data -- though
they are not pegged at 100% or anything.

Here is a typical ntpq output:
ntpq> as
ind assID status conf reach auth condition last_event cnt
1 48644 9634 yes yes none sys.peer reachable 3
2 48645 9034 yes yes none reject reachable 3
ntpq> rv 48644
status=9634 reach, conf, sel_sys.peer, 3 events, event_reach,
srcadr=ntpserv, srcport=123, dstadr=, dstport=123, leap=00,
stratum=1, precision=-9, rootdelay=0.000, rootdispersion=5.554,
refid=GPSM, reach=377, unreach=0, hmode=3, pmode=4, hpoll=7, ppoll=7,
flash=00 ok, keyid=0, offset=360.879, delay=2.544, dispersion=3.803,
jitter=6.636, reftime=c739efcd.cf993b0f Thu, Dec 1 2005 21:55:25.810,
org=c739efde.6ea22848 Thu, Dec 1 2005 21:55:42.432,
rec=c739efde.1292f6e8 Thu, Dec 1 2005 21:55:42.072,
xmt=c739efde.0c8ede54 Thu, Dec 1 2005 21:55:42.049,
filtdelay= 2.54 4.42 2.50 2.98 2.55 2.61 2.44
filtoffset= 360.88 354.24 412.02 412.20 464.11 -95.25
-78.39 -56.90,
filtdisp= 1.96 3.90 5.82 7.77 9.70
11.62 12.61 13.57

If anyone has any suggestions about what might be happening, or how to
keep these guys synched up more tightly, I would certainly appreciate
it. I've dug around through FAQs, Wiki's, Docs, etc... but not sure
exactly why my time is bouncing around so much.

thanks in advance,
Bob Robison bob.robison@xxxxxxxx
Staff Engineer 210-522-3935
Southwest Research Institute San Antonio, TX
questions mailing list