Re: Proposed NTP solution for a network
- From: "Richard B. Gilbert" <rgilbert88@xxxxxxxxxxx>
- Date: Tue, 03 Mar 2009 07:42:39 -0500
Jason wrote:
Below is a description of the environment, and my thoughts on, a resilient and precise NTP configuration. All comments, suggestions, etc. are welcome, indeed requested. I am not a software type, rather networks and hardware, so please consider that with comments and questions.
Here goes:
Three locations: A, B, & C. Locations A and B are datacenters, C is a business office with back-office processing and long-term storage.
A and B are within 10-15 miles of each other near NYC, and C is about 1200 miles from A and B.
All three sites are interconnected in a mesh IP network with dual OC-3 connections from each site -- the network is highly resilient although perhaps not as fast as we might like. A and B additionally have a GigE connection between them for host-host communication, database updates/backups, command & control, etc.
Locations A and B have a large community of Suse 10.x Enterprise servers, each with very stringent requirements to have time be very closely "in sync" with each other at that site, as well as at the other site. Absolute accuracy (i.e. "true time") is not as important as "precision" (that is, all the hosts should be within a few 10s of microseconds, but they could be as much as a small hundreds of microseconds off of UTC).
Steping for time adjustment during prime hours (0700 - 2000) would be very very bad for the transaction record (transactions are very time sensitive). Less sensitive between 2000-0700.
Each client at A and B has multiple GigE connections to the LANs.
The timestamps on transactions should be traceable (i.e. we may need to provide to regulators information on the source, accuracy, and precision of the timestamp of any transaction).
Each of A, B, and C have a dedicated NTP appliance (same make and model, with differing manufacturing dates -- I have since learned that maybe we should mix up the make/model, but "one thing at a time"), with integrated GPS receiver and antenna on the roof. Each site also has access to the Internet.
Note that each NTP appliance can output PPS, but the hosts have no method to receive the PPS (blade servers in an enclosure, and all available expansion slots on each individual blade are in use). In addition, there is no provision on the enclosure to accept a PPS or other time source for distribution to the individual blades using a backplane mechanisim.
A very poor configuration if accuracy is wanted. Typically, one "edge", leading or trailing, of the PPS output is within 50 to 100 nanoseconds of the "top of the second"! The serial output tells you the time value of the PPS "edge".
"Precision" tells you "how fine you can slice it"; e.g. tens of milliseconds, milliseconds, hundreds of microseconds, etc. Accuracy is the difference between your clock and the master clock at the National Institute of Standards and Technology.
Using the serial output alone introduces some uncertainty in the time value. Read the instructions for your appliance CAREFULLY.
Current configuration has all the A and B clients synchronizing with the NTP appliance at B. The NTP appliance at A has suffered an antenna fault, which is being repaired, but even after it is back on-line, the software group wants all hosts to sync to a single NTP appliance. The NTP appliance at C is new and not yet integrated to the solution -- part of the reason for this message.A bad idea! When that appliance fails, as it inevitably will, you will be in the world of hurt!
Let's imagine ten or twenty years from now; your "appliance" has just emitted a cloud of evil smelling black smoke and ceased operation.
What time is it? You'd better hope your wristwatch is accurate!
From reading this newsgroup, the wiki (http://www.ntp.org/ntpfaq/NTP-a-faq.htm) and of course http://www.ntp.org/, this is what I think the hardware configuration should be:
1. Reference clocks: GPS receivers in the NTP appliance are Stratum 0.
2. Stratum 1 level: Each Appliance has an output at Stratum 1 via the Ethernet connection. Each appliance should be a peer to the other appliances (symetric active/passive) as discussed at http://www.eecis.udel.edu/~mills/ntp/html/assoc.html#symact. This would enable the appliance to lose the reference source and still be useful to the Stratum 2 servers that are clients of these appliances.
Suppose you lose your "reference source" or your connection to it?
3. Stratum 2: One server at each of the three locations, each referenceing each of the three NTP appliances. Each would also peer with the other two servers. This will enable the datacenters to keep the local hosts synchronized even if the other sites are unreachable (the servers at A can continue to process transactions even without connectivity to B and C, for example).
4. Clients: All clients at location A would sync to the local server (prefer) and to the server at location B. All clients at B would sync to the A server (prefer) and to the local server at B. All clients at location C would sync to their local server (prefer) and to the server at location A. Thus each client would have a choice of two Stratum 2 servers, each of which is trusted and peered with one-another. In addition, this makes the clients at A and B likely, although not guaranteed, to use the same server for their time.
Several questions:
A. Is the above architecture fitting with best practices? Suggestions for improvement? It seems to fit with Section 6.2.1.3 at http://www.ntp.org/ntpfaq/NTP-s-config-adv.htm.
B. I'm unclear where, or if, "orphan" mode should be used on the servers. Should it be configured at all? What will be the advantage either way? Oh, some more research (http://support.ntp.org/bin/view/Support/OrphanMode) shows that orphan mode is not available in the version we are running.
(
ntpdc> version
ntpdc 4.2.0a@xxxxxxxx Thu Jun 29 17:48:04 UTC 2006 (1)
ntpdc>
)
Is the use of orphan mode advantageous enough to update the NTPd on 200+ hosts?
Orphan mode is for the situation where you lose your external source(s), or where you never had such a source. Some shops are not allowed to connect to the internet, have no GPS, WWV or WWVB receiver. . . .
Outside time sources can use cryptographic authentication; the otherwise unencrypted packet contains an encrypted signature that assures you that it could have been sent only by a holder of the keys.
C. This configuration cannot get past the "survivor" problem where, with three servers, if one fails then the other two cannot find a majority (see http://www.ntp.org/ntpfaq/NTP-s-algo-real.htm, section 5.3.2). So that leads to either trusting an Internet host, adding another receiver, or using a source at an interconnected sister-company in Europe. So, should the servers also have a trusted Internet-based time source? The nature of our business makes the Internet inherently un-trusted for a number of reasons, and having traceable time sources is one of them. Recommendations?
D. Does it make sense, because the time precision is so important, to use servers for the Stratum 2 level that are un-encumbered by other processes? Or should one of the existing 8-core blades be sufficient, perhaps with using processor affinity for the NTP process?
NTP is not terribly demanding! You could run it perfectly well on an old 486/33 if the last one hadn't been consigned to a museum years ago!
E. The "precision" requirement leads me to think that I need all clients at a site to be receiving time from the _same_ server, whether that is the local server or not. How to ensure this requirement is met?
Accuracy!!! Things equal to the same thing are equal to each other. In principle, all the atomic clocks at NIST and at national standards laboratories around the world agree on the time to within a few nanoseconds or maybe better. You can't hope to get that accuracy over the internet. You can, however, keep a small herd of servers marching to that drumbeat even if it differs by five or ten milliseconds from the "One True Time"!
<snip>
.
- Follow-Ups:
- Re: Proposed NTP solution for a network
- From: David Woolley
- Re: Proposed NTP solution for a network
- References:
- Proposed NTP solution for a network
- From: Jason
- Proposed NTP solution for a network
- Prev by Date: Re: http://www.ntp.org/ => a blank page?
- Next by Date: Re: Sorry. Just a test post this time
- Previous by thread: Re: Proposed NTP solution for a network
- Next by thread: Re: Proposed NTP solution for a network
- Index(es):
Relevant Pages
|