Re: Question about statistical significance



On 10 May 2007 09:13:52 -0700, hgwelec <hgwelec@xxxxxxxxx> wrote:

Dear All,


I have used a C4.5 decision tree to make an analysis. The analysis
(classification) is about finding the common characteristics of "good"
clients.

Say for example that out of the decision tree the following "rule" is
shown:


IF AGE >32
AND NUM_OF_CHILDREN > 2
AND CLIENT_PROFESSION="DOCTOR"
AND GENDER="MALE"
THEN
CLIENT="GOOD"

Now, the above rule has 85% accuracy and a 25% coverage on the
dataset.

What does that mean? Specificity and sensitivity?
Is that very good?

The dataset consists of 700 cases



What i would have to do in order to assess whether this fact is NOT
atrtributed to pure chance?

Classification trees: Generally, you have to derive them
on one set of data, and confirm them on another set.
If you do that a lot of times, then you need another
level of depth of replication, to show that replicating
is not a matter of trying too many times.

A chi-square test clearly cannot be used since AGE and NUM_OF_CHILDREN
are not categorical variables.

You can make them categorical, just like the program
does. The other way around 'too many tests' (to some
extent) is to ask for an exceedingly tiny p-level and a
large effect size.

--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.



Relevant Pages

  • Re: Choosing the right method
    ... > and one response variable which is the occuring of the disease. ... tendency of many classification programs to classify all cases as the most ... I am the author of a decision tree based modeling program called DTREG ...
    (sci.stat.math)
  • Re: Deformed frogs back in the news
    ... Occam's Razor doesn't claim that a simpler explanation or theory is always correct or better - simply that, all other things being equal, you go with the least complex. ... "Webb fed a computer different sets of real-life data, such as credit ratings and medical records, with some containing more than 3,000 examples. ... The computer would then create a decision tree with the fewest branches and finally use that tree to try to classify the remaining 20 percent of the examples. ... Nobody is claiming that introducing "additional decision-making criteria if doing so would *help* in the classification" wouldn't give you a better decision-making tree. ...
    (rec.sport.football.college)
  • Re: Finding Statistically Significant Rules
    ... I have used a C4.5 decision tree to make an analysis. ... The dataset consists of 700 cases ... Look at the distribution of accuracies. ...
    (sci.stat.edu)
  • Re: Question about Decision Trees and Neural Networks
    ... Our decision tree uses entropy to score the contribution of each attribute ... since the average-amount attribute gets highest socre. ... information for classification). ... Second question, if I run the same data in a Neural Network model, the ...
    (microsoft.public.sqlserver.datamining)
  • Re: Question about statistical significance
    ... Specificity and sensitivity? ... Classification trees: Generally, ... level of depth of replication, ... are not categorical variables. ...
    (sci.stat.consult)