Re: NTP clients not syncing up to servers?
- From: mayer@xxxxxxxxxxx (Danny Mayer)
- Date: Sun, 9 Oct 2005 18:23:17 GMT
Ted Beatie wrote:
What we're trying to do;
We deploy systems at customer facilities that have both "gateway" and "storage-node" machines; the gateways connect to the rest of customer site and the storage-nodes, and the storage-nodes connect only to one another and the gateways. We'd like the gateways to sync to either a customer-supplied NTP server or external NTP servers, and the other gateways, and the storage-nodes to sync to the gateways (and trust them completely).
The setup;
The gateways have 2 or 3 interfaces, one of which goes to the internal LAN, and the other one or two go to private back-end switches. The ntp.conf on the gateways looks like this;
driftfile /var/lib/ntp/ntp.drift statsdir /var/log/ntpstats/ statistics loopstats peerstats clockstats filegen loopstats file loopstats type day enable filegen peerstats file peerstats type day enable filegen clockstats file clockstats type day enable
server <one or more servers, external or internal> server <one or more other gateways, using the back-end addresses>
Add iburst to the end of each server line. This speeds up synchronization.
Three servers are an absolute minimum because 2 means it has no way of knowing which is providing better information. Let's leave aside the question of the meaning of the word "better", it's a very complicated subject.The storage-nodes have 2 interfaces, each of which goes to back-end switches. The ntp.conf on the storage-nodes looks like this;
driftfile /var/lib/ntp/ntp.drift statsdir /var/log/ntpstats/ statistics loopstats peerstats clockstats filegen loopstats file loopstats type day enable filegen peerstats file peerstats type day enable filegen clockstats file clockstats type day enable
server <two or more gateways, using the back-end addresses>
furthermore, /etc/init.d/ntp has been modified on the storage-nodes to include the -g flag.
Good.
The problem;
It doesn't seem to work reliably;
gateway1:~# date;for i in 2 51 52 53 54; do ssh -1 10.123.123.$i date;done Fri Oct 7 11:10:36 EDT 2005 Fri Oct 7 11:10:35 EDT 2005 Fri Oct 7 11:18:27 EDT 2005 Fri Oct 7 11:14:05 EDT 2005 Fri Oct 7 11:08:26 EDT 2005 Fri Oct 7 11:22:15 EDT 2005
I'm unsure what's going on, or how to diagnose. It looks like everything is communicating properly;
On the gateway (time 11:10 above);
gateway1:~# ntpq -c pe localhost remote refid st t when poll reach delay offset jitter ============================================================================== <internal NTP server> <> 2 u 38 64 377 0.278 -1565.0 4.599 <other gateway> <> 4 u 928 1024 377 0.114 -1135.6 2.005 <other gateway> <> 16 u 805 1024 0 0.000 0.000 4000.00
Based on the above the internal NTP server has a stratum of 2 and will almost always be used over a stratum of 4. Is that internal NTP server getting its data from a stratum 1 server and is it internal or external?
By obfuscating the addresses it's hard to know if you've also removed the Tally Codes which indicates what gateway1 thinks of the servers. Since you are using the private address space for this it really doesn't matter if they're seen. If you don't want to show the names, just add a -n and it won't translate the IP addresses.
On one of the storage-nodes (time 11:14 above);
storage-node2:~# ntpq -c pe localhost remote refid st t when poll reach delay offset jitter ============================================================================== <gateway 1> <> 3 u 40 64 377 0.136 -207752 3.355 <gateway 2> <> 4 u 35 64 377 0.123 -208891 4.606
This only has two servers and you need at least 3. As it is gateway1 and gateway2 are at two different stratum levels. However you need to fix the problem first on the gateways.
Looking at the debugging techniques, and seeing that the tally code is a space, and delving deeper, I see;
gateway1:~# ntpq -c as localhost ind assID status conf reach auth condition last_event cnt =========================================================== 1 47900 9014 yes yes none reject reachable 1 2 47901 9014 yes yes none reject reachable 1 3 47902 8000 yes yes none reject
storage-node2:~# ntpq -c as localhost ind assID status conf reach auth condition last_event cnt =========================================================== 1 16076 9064 yes yes none reject reachable 6 2 16077 9064 yes yes none reject reachable 6
Usually you will see these kinds of results when the server you are looking at has just started. You really need to give it time to synchronize.
So obviously the responses are getting rejected, but I'm not clear why. Looking at what should theoretically be the upstream internal NTP server from the gateway;
gateway1:~# ntpq -c 'rv 47900' localhost status=9014 reach, conf, 1 event, event_reach, srcadr=sfprinters.us.babcockbrown.com, srcport=123, dstadr=10.16.4.150, dstport=123, leap=00, stratum=2, precision=-7, rootdelay=0.000, rootdispersion=9733.109, refid=bigbird.babcockbrown.com, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0, offset=-1562.833, delay=0.230, dispersion=11.665, jitter=7.127, reftime=c6f0cea2.4526e978 Fri, Oct 7 2005 6:38:26.270, org=c6f11314.3e76c8b4 Fri, Oct 7 2005 11:30:28.244, rec=c6f11315.d120d130 Fri, Oct 7 2005 11:30:29.816, xmt=c6f11315.d10cc35c Fri, Oct 7 2005 11:30:29.816, filtdelay= 0.31 0.31 0.36 0.30 0.23 0.39 0.27 0.37, filtoffset= -1572.7 -1562.2 -1567.0 -1572.8 -1562.8 -1567.7 -1573.4 -1563.1, filtdisp= 7.83 8.82 9.78 10.74 11.68 12.64 13.60 14.58
This appears to indicate it received just one packet which is not enough to synchronize anything. How long did you wait for the server after it was started to interrogate this server? You need to wait at least 15-20 minutes when you don't use iburst.
Be careful of flash codes as they change from time to time. I usually need to look at the code to figure out the current meaning.And looking at one of the gateways from the storage-node;
storage-node2:~# ntpq -c 'rv 16076' localhost status=9064 reach, conf, 6 events, event_reach, srcadr=10.123.123.1, srcport=123, dstadr=10.123.123.52, dstport=123, leap=00, stratum=3, precision=-16, rootdelay=0.244, rootdispersion=9734.558, refid=10.16.4.1, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0, offset=-207768.629, delay=0.118, dispersion=3.779, jitter=3.407, reftime=c6e82e68.10c63f14 Fri, Sep 30 2005 17:36:40.065, org=c6f11375.feae6c8f Fri, Oct 7 2005 11:32:05.994, rec=c6f11445.c40d2c38 Fri, Oct 7 2005 11:35:33.765, xmt=c6f11445.c3fec13b Fri, Oct 7 2005 11:35:33.765, filtdelay= 0.19 0.14 0.12 0.14 0.14 0.15 0.14 0.14, filtoffset= -207770 -207769 -207768 -207767 -207766 -207765 -207763 -207762, filtdisp= 0.03 1.01 1.97 2.91 3.90 4.88 5.85 6.83
I've also seen a flash value of 80. Now it appears to be 00.
I'm at a loss here. What we want is for the gateways to get their times from either upstream external NTP sources, or internal sources, or to just accept their own time, and the storage-nodes should get their times from the gateways and believe them no matter what the skew.
Well the first part is to set up the gateway servers to point to NTP servers with the same stratum.
How can I go about figuring out where to go from here? Thanks in advance for any help..
See above.
Danny
-- Ted Beatie Permabit, Inc. ted@xxxxxxxxxxxx Sr. Systems Engineer One Kendall Sq, Cambridge, MA +1-617-995-9317
_______________________________________________ questions mailing list questions@xxxxxxxxxxxxxxxxx https://lists.ntp.isc.org/mailman/listinfo/questions
_______________________________________________ questions mailing list questions@xxxxxxxxxxxxxxxxx https://lists.ntp.isc.org/mailman/listinfo/questions
.
- Follow-Ups:
- Re: NTP clients not syncing up to servers?
- From: Ted Beatie
- Re: NTP clients not syncing up to servers?
- From: David Woolley
- Re: NTP clients not syncing up to servers?
- References:
- NTP clients not syncing up to servers?
- From: Ted Beatie
- NTP clients not syncing up to servers?
- Prev by Date: Deadline soon for leap second comments
- Next by Date: Re: Deadline soon for leap second comments
- Previous by thread: Re: NTP clients not syncing up to servers?
- Next by thread: Re: NTP clients not syncing up to servers?
- Index(es):