Re: Question about statistical significance
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Fri, 11 May 2007 21:44:02 -0400
On 10 May 2007 09:13:52 -0700, hgwelec <hgwelec@xxxxxxxxx> wrote:
Dear All,
I have used a C4.5 decision tree to make an analysis. The analysis
(classification) is about finding the common characteristics of "good"
clients.
Say for example that out of the decision tree the following "rule" is
shown:
IF AGE >32
AND NUM_OF_CHILDREN > 2
AND CLIENT_PROFESSION="DOCTOR"
AND GENDER="MALE"
THEN
CLIENT="GOOD"
Now, the above rule has 85% accuracy and a 25% coverage on the
dataset.
What does that mean? Specificity and sensitivity?
Is that very good?
The dataset consists of 700 cases
What i would have to do in order to assess whether this fact is NOT
atrtributed to pure chance?
Classification trees: Generally, you have to derive them
on one set of data, and confirm them on another set.
If you do that a lot of times, then you need another
level of depth of replication, to show that replicating
is not a matter of trying too many times.
A chi-square test clearly cannot be used since AGE and NUM_OF_CHILDREN
are not categorical variables.
You can make them categorical, just like the program
does. The other way around 'too many tests' (to some
extent) is to ask for an exceedingly tiny p-level and a
large effect size.
--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.
- Follow-Ups:
- Re: Question about statistical significance
- From: hgwelec
- Re: Question about statistical significance
- References:
- Question about statistical significance
- From: hgwelec
- Question about statistical significance
- Prev by Date: Re: Power for a inter-rater agreement
- Next by Date: Re: Question about statistical significance
- Previous by thread: Question about statistical significance
- Next by thread: Re: Question about statistical significance
- Index(es):
Relevant Pages
|