Re: feedback...
- From: Duncan Smith <buzzard@xxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 01 Jul 2005 15:47:35 +0100
Joe wrote:
> "Duncan Smith" <buzzard@xxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:d9vhrg$ros$1@xxxxxxxxxxxxxxxxxxxxxx
>
>>Joe wrote:
>>
>>>Hi Duncan,
>>>
>>>Thank you for the information. I can see how the means are converging.
>>>
>>>That's pretty amazing! I computed all the positional means for my samples
>>>of
>>>actual drawings (468 drawings on the 5/35 game) and came up with:
>>>
>>> 6.1599
>>> 12.6567
>>> 18.8166
>>> 24.2367
>>> 29.951
>>>
>>>
>>>
>>>But that is a small number of drawings compared to what you simulated.
>>>
>>>I am not sure what you mean when you say 'prediction intervals ' for the
>>>means being proportional to 1/sqrt(N) for N draws, and will be wider for
>>>the
>>>more central positions? I guess I don't know what you mean by prediction
>>>intervals. Is there an equation that describes it, or maybe a qualitative
>>>definition of it?
>>>
>>>Joe
>>>
>>>
>>
>>What I mean is an interval within which you would expect a positional
>>mean (from N sample draws) to fall with 95% confidence. The variances
>>of each positional value can be calculated by iterating through all the
>>draws. The variance of the mean, after N draws, for a given position is
>>the relevant positional variance divided by N. For large enough N (and
>>'large enough' depends on the actual positional distributions, but is
>>typically 30-50) the mean is approximately normally distributed.
>>
>>e.g. for a 6,49 lottery the mean for position 1 is 7.143 with
>>(population) variance 32.91. For the mean after two draws we have 7.143
>>with variance 16.45, after 5 draws 7.143 and 6.58 etc. For N much
>>greater than about 50 the distribution of the mean will be approximately
>>normally distributed, so you can construct a 95% interval by,
>>(mu-1.96*sqrt(var/N), mu+1.96*sqrt(var/N)), where 'mu' and 'var' refer
>>to the population mean and variance respectively (i.e. mean and variance
>>for a single draw). So this interval narrows in proportion to
>>1/sqrt(N). That's how I calculated the intervals I gave in an earlier
>>post.
>>
>>This is all based on independent draws. The convergence is not due to
>>'overdue numbers / combinations' coming up to balance things out. If
>>that actually were the case, it would converge more quickly than
>>1/sqrt(N).
>>
>>Duncan
>
>
> Hi Duncan,
> More good information, thank you.
> It's taken me awhile to digest the above information. I hope you don't mind
> a few more questions. In the above equation for the 95% interval, the term
> sqrt(var/N) , I know that sqrt(var) is the 1 sigma (or standard deviation).
> The var is computed by var=(1/N)*Sum(xi-mu)^2 (summed over i=1 to N).
N in that case is the number of possible combinations. i.e. you work
out the average (xi-mu)^2. e.g. generate all the combinations and total
the squared deviations from the mean, then divide by the total number of
comnbinations to get the mean squared deviation (variance). That gives
the variance of the distribution from which a draw is made.
It is
> counter intuitive to me why we have to divide by N again in the above
> equation.
This time N is the number of draws (I would not have used N for this, if
I'd also used it for the variance equation). Lower case n would be more
usual for sample size, but I avoided that because I often refer to n, k
lotteries.
Is there a name for the term sqrt(var/N) ?
The standard error.
Also, I am wondering
> where the 1.96 comes from?
The 2.5 and 97.5 percentiles for the normal distribution are at -1.96
and 1.96 standard deviations from the mean of the normal distribution.
When you say for N much greater than 50, would
> 100 be considered much greater than 50 in this context?
This all depends on the distribution of the population, but with 100
you'll be pretty safe in most situations. Severe skew can make the
required sample size quite large, whereas with symmetric distribution 30
is usually enough.
> Do you have a link to a site, or a suggestion for a book that might explain
> the above in more detail?
>
Try googling for Central Limit Theorem and standard error.
The point here is that the variance of the mean of a sample is less than
the population variance. e.g. Try generating some 0-1 random numbers
and calculate the mean. Because you'll usually get a reasonable spread
of numbers the mean is very unlikely to be close to 0 or 1. The more
numbers you generate the closer the sample mean will generally be to
0.5. Also, you'll find that if you did this repeatedly for some given
sample size, then the distribution of the means would look normally
distributed (for large enough sample sizes).
This is the basis of the z-test. Usually we have a sample mean which we
want to compare with some hypothesised population mean. Here we know
the (theoretical) population mean and want to make predictions about
sampling means for a given sample size. But the calculation of
intervals is the same, even if the interpretation of the intervals is a
little different. i.e. They are prediction intervals, rather than
confidence intervals.
Duncan
.
- References:
- Re: feedback...
- From: Duncan Smith
- Re: feedback...
- From: Joe
- Re: feedback...
- From: Duncan Smith
- Re: feedback...
- From: Joe
- Re: feedback...
- From: Duncan Smith
- Re: feedback...
- From: Joe
- Re: feedback...
- From: Duncan Smith
- Re: feedback...
- From: Joe
- Re: feedback...
- Prev by Date: Re: feedback...
- Next by Date: RGL Gallery Update
- Previous by thread: Re: feedback...
- Next by thread: Spam Re: Play games
- Index(es):
Relevant Pages
|