Re: small data set



Corrected for the heinous sin of top-posting.

On May 5, 5:59 pm, "giannis " <fanzi...@xxxxxxxxxxx> wrote:

Greg Heath <he...@xxxxxxxxxxxxxxxx> wrote in message

<9b4c2a53-7f64-42a4-a546-5a8e0f9e2...@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>...
On May 1, 7:22=A0am, Greg Heath <he...@xxxxxxxxxxxxxxxx>
wrote:
On May 1, 6:30=A0am, "giannis " <fanzi...@xxxxxxxxxxx>
wrote:

Hello.

I am doing a statistical research using KNN,neuralnets and
SVM.. The problem is the very small data set (25
speciments).

I am using cross validation to resample the data but I am
not sure if my results can be accurate with such a small
data set.

can you please suggest any method to use as best as
possible
=A0such a small data set?
thank you in advance =A0

Bootstrapping

Search the mathworks website.

If you have prior information on the form of the probability
distribution function, you can use the 25 observations to
estimate the parameters and then generate more "data".
The danger is that, even in one dimension, 25 observations
will not give you precise parameter estimates.

If you don't have such prior information you can test
hypotheses as to which distribution the data might be
from. However, with only 25 observations the testing will
be far from definitive. You may test several distributions,
find that you can reject all except one. However, that does
not guarantee that it will be the correct distribution.

=2E..suddenly I have the feeling that the data is not
1-dimensional!

What are the dimensions of your input and output?
Exactly what type of problem do you have and what
exactly do you want the neural net to do?

Hello Greg,

thank you for all your help.

I have data from 25 people. 20 of them have lung cancer and
5 don't. I have 6 different characteristic for each person.
(so the array is 25X6)

the tasks are:to produce two classifiers
1st: to classify between a constant value - 2 outputs)
2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5
outputs)    

I tried to use SVM, Linear regresion, Backpropagation and
RBF Neural Nets and KNN.

I tried to reshuffle my data using Leave One Out Cross
Validation (LOOCV) so keeping each time one for testing and
24 for training.

hope I gave you the picture..?

What kind of error rates are you getting for each method?
What are the largest error rates that you would accept?

When you plot the desired {0,1} classification vs each
of the inputs does there appear to be predictive capability?
What are the corresponding correlation coefficients?

Hope this helps.

Greg
.



Relevant Pages

  • Re: Computing derivative using finite differnce method
    ... Inorder to compute the asympotic covariance of the MLE of Weibull ... I took the data set: ... and the loglikehood function for the Weibull distribution fuction: ...
    (comp.soft-sys.matlab)
  • Re: Distribution Fitting and GoF Tests
    ... Regarding the p-value, I'm assuming the higher the p-value, the ... to be significant if you use the entire data set. ... from the theoretical distribution might be statistically ... It's true that the K-S test and some others require that the ...
    (comp.soft-sys.matlab)
  • Re: Maximum likelihood estimator and multiple maxima
    ... an approximation which assumes a Gaussian distribution ... > So I went to the larger data set with the hope that it ... if the model has multiple maxima of likelihood ...
    (sci.stat.math)
  • Computing derivative using finite differnce method
    ... Inorder to compute the asympotic covariance of the MLE of Weibull ... I took the data set: ... and the loglikehood function for the Weibull distribution fuction: ...
    (comp.soft-sys.matlab)