Re: strange behaviour of ntp peerstats entries.
- From: Unruh <unruh-spam@xxxxxxxxxxxxxx>
- Date: Mon, 28 Jan 2008 02:19:44 GMT
mayer@xxxxxxxxxxx (Danny Mayer) writes:
Unruh wrote:
mayer@xxxxxxxxxxx (Danny Mayer) writes:
Unruh wrote:
Brian Utterback <brian.utterback@xxxxxxx> writes:
Unruh wrote:Note that the situation can arise that the one can wait many more than 8
"David L. Mills" <mills@xxxxxxxx> writes:
You might not have noticed a couple of crucial issues in the clockI did notice them all. Thus my caveate. However throwing away 80% of the
filter code.
precious data you have seems excessive.
samples for another one. Say sample i is a good one. and remains the best
for the next 7 tries. Sample i+7 is slightly worse than sample i and thus
it is not picked as it comes in. But the next i samples are all worse than
it. Thus it remains the filtered one, but is never used because it was not
the best when it came in. This situation could keep going for a long time,
meaning that ntp suddenly has no data to do anything with for many many
poll intervals. Surely using sample i+7 is far better than not using any
data for that length of time.
On the contrary, it's better not to use the data at all if its suspect.
ntpd is designed to continue to work well even in the event of loosing
all access to external sources for extended periods.
And this could happen again. Now, since the
delays are presumably random variables, the chances of this happening are
not great ( although under a condition of gradually worsening network the
chances are not that small), but since one is running ntp for millions or
billions of samples, the chances of this happening sometime becomes large.
There are quite a few ntpd servers which are isolated and once an hour
use ACTS to fetch good time samples. This is not rare at all.
And then promplty throw them away because they do not satify the minimum
condition? No, it is not "best" to throw away data no matter how suspect.
Data is a preecious comodity and should be thrown away only if you are damn
sure it cannot help you. For example lets say that the change in delay is
.1 of the variance of the clock. The max extra noise that delay can cause
is about .01 Yet NTP will chuck it. Now if the delay is 100 times the
variance, sure chuck it. It probably cannot help you. The delay is a random
process, non-gaussian admitedly, and its effect on the time is also a
random process-- usually much closer to gaussian. And why was the figure of
8 chosen ( the best of the last 8 tries) why not 10000? or 3? I suspect it
came off the top of someone's head-- lets not throuw away too much stuff,
since it would make ntp unseable, but lets throw away some to feel
virtuous. Sorry for being sarcastic, but I would really like to know what
the justification was for throwing so much data away.
No, 8 was chosen after a lot of experimentation to ensure the best
results over a wide range of configurations. Dave has adjusted these
numbers over the years and he's the person to ask.
OK. The usual comment is that you throw away about 40% of the data using
the median filter (eg looking at the shm refclock program where that
40%figure is attributed to him and in ntp as well). But here one is trowing
away over 80% ( Ie keeping less than 1/6 of the data).
Running a very quick test on one system on my lan, I find that this changes
the variance of the offsets by about 10%. Ie, it makes only a marginal
difference to the variance. ( and yes, there is a fair amount of
correlation between the offset fluctuation and the delay fluctutation.
(correlation coefficient .5) . Actually the main thing this seems to do is
to make the variance in the delay times small, not the variance in the
offset.
I am also a little bit surprized that it is the delay that is used and not
the total roundtrip time. As I seem to read it, the delay is (t4-t3+t2-t1)
ie, it does not take into account the delay within the far machinei (eg
t4-t1), but
only propagation delay. I would expect that the former might even be more
important than the latter, but that is a pure guess-- ie no measurements on
even one system to back it up.
Now it may be that on that rocky road to Manila, the propagation delay is
by far the most important, but on a moderm lan, especially with a low
propagation delay of hundreds of usec rather then 100s of msec, I wonder.
I munged ntp record_peer_stats to also print out the p_off and p_del, (ie
the immediate offset and delay of the current packet) and counted up in the
output how often peer->off and p_off are different from each other,
indicating a thrown away packet of data. I got 83% of the time.
.
- Follow-Ups:
- Re: strange behaviour of ntp peerstats entries.
- From: Brian Utterback
- Re: strange behaviour of ntp peerstats entries.
- References:
- strange behaviour of ntp peerstats entries.
- From: Unruh
- Re: strange behaviour of ntp peerstats entries.
- From: root
- Re: strange behaviour of ntp peerstats entries.
- From: David L. Mills
- Re: strange behaviour of ntp peerstats entries.
- From: Unruh
- Re: strange behaviour of ntp peerstats entries.
- From: Brian Utterback
- Re: strange behaviour of ntp peerstats entries.
- From: Unruh
- Re: strange behaviour of ntp peerstats entries.
- From: Danny Mayer
- Re: strange behaviour of ntp peerstats entries.
- From: Unruh
- Re: strange behaviour of ntp peerstats entries.
- From: Danny Mayer
- strange behaviour of ntp peerstats entries.
- Prev by Date: Re: NTP daemon - fixed offset against real time
- Next by Date: Re: NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)
- Previous by thread: Re: strange behaviour of ntp peerstats entries.
- Next by thread: Re: strange behaviour of ntp peerstats entries.
- Index(es):
Relevant Pages
|