Re: clarify or clogify? re: Why the human race is growing apart



On Apr 10, 3:19 pm, Garamond Lethe <cartographi...@xxxxxxxxx> wrote:
On Fri, 10 Apr 2009 11:44:13 -0700,tgdenningwrote:
On Apr 10, 1:52 pm, Garamond Lethe <cartographi...@xxxxxxxxx> wrote:
On Fri, 10 Apr 2009 10:32:13 -0700,tgdenningwrote:

<snip>

Sigh. We're talking about IQ tests. Are you saying that the size of
the population that has taken IQ tests is 'small', and that half of
that number is also 'small', in a statistical sense?

It's not the number of people who have taken IQ tests, it's the number
of people per group included in this particular study.  I'm smart
enough to know that questions of "is this sample big enough to make
these particular results significant" is best left to people who do
statistics for a living, and barring that, should be done only by
people who are very familiar with the limitations of their data.

The question I was addressing, though, was much simpler:  given G
groups of N numbers each pulled from some distribution, is it likely
that the average for one group will be higher than the rest?  The
answer is "probably yes" for small values of N and "certainly not" for
infinite values of N.  Whether or not this largest value is significant
is a much harder question that I will sensibly leave to others more
skilled in the art.

I'm not completely clear what you mean by "largest value is
significant". But if the sample size is large enough---and that doesn't
have to be very large at all, relative to the population size (look at
political polls for example)---then the mean of the sample will be the
same as the mean of the population as a whole.

So let's say there is a population of 5000 taking a test. The mean and
median score is 65%. If we select at random 5 groups of 1000 each, I
would be willing to wager a few bucks that the mean and median score of
each group would be 65%.

Here's what I get.  R uses ">" as the input prompt; apologies if this
screws up the quoting.

# Start with a normal distribution of 5000 individuals where the
#  mean should be .65.  

population<-rnorm(5000, mean=.65)

# Median and mean of the population:  mean is tolerably close,
# median isn't.  > median(population)
[1] 0.680727
mean(population)

[1] 0.6596989

# Median and mean of each 1000-individual subgroup.

median(population[1:1000])
[1] 0.6391295
mean(population[1:1000])
[1] 0.6206391
median(population[1001:2000])
[1] 0.7184306
mean(population[1001:2000])
[1] 0.6971344
median(population[2001:3000])
[1] 0.7047903
mean(population[2001:3000])
[1] 0.6709924
median(population[3001:4000])
[1] 0.7241578
mean(population[3001:4000])
[1] 0.6858281
median(population[4001:5000])
[1] 0.6308663
mean(population[4001:5000])

[1] 0.6239006

R is free software and can be grabbed here:

http://www.r-project.org/

Try playing with it and see what you think.  For example, cranking up the
population figures gives you something much closer to what you were
expecting:

population2<-rnorm(5000000, mean=.65)
median(population2)
[1] 0.6500446
mean(population2)

[1] 0.6498886

I'll leave the proper explanation of this to an Actual Mathematician.



So I am disinclined to accept your comparison of 'small N' and
'approaching infinity'---you may be bordering on (unconscious) circular
reasoning, I think, where you define "small N" as "that which is small
enough not to be representative". Perhaps one of those real
statisticians will advise us on the real ranges we are talking about (or
take my money).

-tg

BTW, I give you credit for at least having a clue as to what I am
talking about, which the others obviously don't.

Thanks.

<snip>

OK, thanks, that's why I stay away from casinos.

Anyway, I think I can clarify this now. There are two different things
that we are talking about.

1) IQ. I think the first problem for the other guys is that they are
thinking of IQ as a score on a test, when it is actually membership in
a class interval. All the people who are reported as 'having an IQ of
123' did not score exactly the same on whichever battery of tests they
took. If 65 is the mean test score for the population, then
individuals with scores of 64.9 and 65.1, for example, would both
'have IQ of 100'.

All that's required for two groups within the population to have the
same IQ rank is that their mean scores fall within that class
interval. So if the class interval is equivalent to a few standard
deviations as calculated for a given sample (group) size, we can be
*very confident* that both groups have the same IQ. This condition
would seem to be highly likely for the numbers involved in the IQ
test.

****

2) Direct test scores:

Assume that we begin with a range of integer scores, and there is no
rounding or classification. Then indeed it is *not* highly probable
(but not necessarily zero or very low probability) that groups of half
the population will have the identical mean score. Of course, this is
not a realistic scenario, but it seems to be the basis of (your)
argument.

However, in this scenario, I don't see that decreasing SD by
increasing the size of the two samples increases the probability of
the two samples being identical and equal to the population mean,
which you and the other guy suggest. You seem to be having it both
ways if you make that argument. I am open to correction on this but
but you have to make the case.

-tg

.



Relevant Pages

  • Re: Liskov Substitution Principle and Abstract Factories
    ... If I tell you god is sitting behind a door, ... As Littlewood noted probability introduces vicious circle. ... >> The whole of statistics is dubious in some sense, ... > are guessing. ...
    (comp.object)
  • Re: Liskov Substitution Principle and Abstract Factories
    ... I would assume that temperature is some sort of absolutely defined sum of ... >> but it still submits to probability. ... You can use statistics to judge people, ... good mathematician = mathematician who passed BSc in maths with a first. ...
    (comp.object)
  • Re: Liskov Substitution Principle and Abstract Factories
    ... Statistics is just a way to do something you know to be ... The probability is not 0. ... for almost any purpose. ... ClientAdd have to be a member of Real. ...
    (comp.object)
  • Re: why is probability and statistics a hard subject?
    ... I learned so-called classical statistics as an undergraduate engineer. ... It is from these that I fully came to appreciate that the _entire_ inferential import of any experiment, given any proposed model, is wholly contained within the likelihood function. ... The Bayesians cut through the Gordian knot by, in effect, elevating the likelihood function to the status of a probability density function. ... I realized that probabilities could be fuzzy: I considered the thought-experiment: If a friend fabricated an entirely new thumb-tack, never before seen or used anywhere, with an entirely new geometry. ...
    (sci.stat.math)
  • Re: why is probability and statistics a hard subject?
    ... I think probability and statistics simply requires more time to ... Understand "likelihood" as a measure of compatibility between ... P is the parameter, and P lies in the parameter space. ...
    (sci.stat.math)