Re: OT: "Rabbit Hunting" in stat/handicapping
- From: Raider Fan <raidersgotscrewed1@xxxxxxxxxxx>
- Date: Thu, 15 Mar 2007 11:57:38 GMT
"eleaticus" <eleaticus@xxxxxxxxxxxxx> wrote in
Pw1Kh.7593$B7.1468@bigfe9:">news:Pw1Kh.7593$B7.1468@bigfe9:
Some academics, sports handicappers, and others tend to throw a bunch
of statistical tests into the hat and viola! pull out a significant
relationship.
The best academics know that the significance result should have been
for a relationship already theorized, but even they do this "rabbit
hunting" and THEN theorize.
Let's take a look at what can be expected if, say, you throw a bunch
of random variables into the correlation hat.
Ten variables give you 45 correlations.
Using the runs test to see what the expectation is in this situation,
we use the formula Nr = n(1-p)(p^r), where r is the 'run length' of
minimum interest and Nr the expected number of such runs (or longer
runs) in n 'trials' at the given probability, p.
Here, r = 1. Let's look at p=.01 for the number of trials to get an
expected run of Nr=1 or longer.
1 = n(.99)(.01) = n(.0099), and n = 101, which says in effect that
with about 15 random variables we'd expect one or more highly
'significant' but meaningless correlations.
(a. We dropped the exponent of p because it is 1.00 here.)
(b. 15 variables gives us [15*14/2=105].)
At p=.05, 1 = n(.95)(.05), and n = 21.
So, with 10 random variables thrown into the correlation bonnet, we
expect a number of misleading bees amongst them.
Let's check to see how many .05 results we would expect in the 45
'trials' for 10 variables.
Nr = 45(.95)(.05^1)
Nr = 2.1375.
And in the 105 trials on 15 random variables?
Nr = 105(.95)(.05^1)
Nr = 4.9875.
The result is, when you include both teams' full number set of 4
variables instead of just the two each that are directly relevant, you
get a greater decrement in the 'adjusted' R-square than you would
without the extras. This acknowledges the role of additional noise in
the additional information variable.
I have - on those occasions when I try the full set for curiosity's
sake - sometimes found the net adjusted R-square is no higher than
with just half of the variables, the directly relevant four.
Man, I need a drink. It's simple. An indoor or southern team playing at
Lambau in December. Take the Packers and lay the points.
.
- Follow-Ups:
- Re: OT: "Rabbit Hunting" in stat/handicapping
- From: eleaticus
- Re: OT: "Rabbit Hunting" in stat/handicapping
- References:
- OT: "Rabbit Hunting" in stat/handicapping
- From: eleaticus
- OT: "Rabbit Hunting" in stat/handicapping
- Prev by Date: Re: Attention moron (oops, I mean Stallion). A response
- Next by Date: Re: recgroups ???
- Previous by thread: OT: "Rabbit Hunting" in stat/handicapping
- Next by thread: Re: OT: "Rabbit Hunting" in stat/handicapping
- Index(es):
Relevant Pages
|