Re: PLEASE HELP ME , I am stuck!!!!
- From: "Zaraxustra" <spammustgo@xxxxxxxxx>
- Date: 12 Feb 2006 20:58:15 -0800
I have seen several replies to your message regarding tool selection.
While I believe that is an important factor, your question seems to be
more oriented to the actual choice of algorithms, rather than a
specific tool.
When selecting a model for your data, your best bet, especially if
you're working with a data set you haven't analyzed before, is to start
from the basics: run a multivariate regression on your input variables
and see what a linear fit by least squares brings back. If you're not
familiar with concepts like linear regression, least squares
optimization, covariance matrices and things like that, you may want to
look that up first. Understanding linear regression (first univariate
and then multivariate) is the cornerstone of getting a grasp on almost
any other mining algorithm out there.
That said, you will probably have one or more variables that do not
really explain the output (weight). It is best to find the simplest
model that explains your data well. This usually entails eliminating
irrelevant variables. If you don't eliminate them, regression methods
will overfit your data based on every variable and the predictive power
of your model will suffer. One of the easiest methods for model
reduction is by using Z scores. Look that up and see if it helps you.
Once you have a baseline using linear regression, you can start making
more involved decisions as to looking for more sophisticated models. In
many cases, linear regression will be enough. In just as many it won't.
Choosing a different model requires you to know what each model has to
offer. You may be able to fit the data very well with a neural network,
logistic regression or even perhaps discriminant analysis.
Another thing to look out for is the content of your input variables.
If you have discrete variables, you need to be careful about modeling
them (you may need to assign arbitrary numbers to encode categorical
values such as, for example, types of animal feed). Also, if you have
continuous variables of distinct ranges, it never hurts to normalize
them (mean zero, standard deviation of 1) to avoid running into scale
issues.
I'm sorry for being so vague but choosing a model is quite an involved
process of selection and experimentation. I hope at least it gives you
some pointers. Check this site:
http://www.autonlab.org/tutorials/
mbowen@xxxxxxxxx wrote:
The most straightforward and clear explanations I have heard about data
mining come from the folks at SPSS. They have a product called
Clementine, which makes it easy. They have years of experience and will
walk you through stuff even during a sales cycle. Check them out.
.
- Prev by Date: fake delta,retaining update table deletion
- Next by Date: Re: OLAP newbie
- Previous by thread: fake delta,retaining update table deletion
- Next by thread: Re: OLAP newbie
- Index(es):
Relevant Pages
|
|