Re: Difference between Data Mining and Machine Learning
- From: "Phil Sherrod" <phil.sherrod@xxxxxxxxxxxxxxxxxxx>
- Date: Tue, 21 Feb 2006 21:40:23 GMT
On 21-Feb-2006, Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx> wrote:
"Data mining" started out as a hostile barb for the habit
of "looking at too many variables" -- before it was adopted
as a commercial name. I was surprised, frankly, that someone
asked for the difference between "data mining" and "machine
learning". Is that the nature of the commercial data-miners?
"Machine learning" is a fairly general term for model building that
encompasses techniques such as neural networks, decision trees, support
vector machines, etc.. It could be applied to regression, but it's usually
reserved for more complex techniques that develop more complicated models.
The process "learns" by analyzing training data. I prefer the term
"predictive modeling".
I still use data-mining as a term for, mainly, "many variables
without hypotheses that are necessarily clear." I've thought
of it as creating linear regressions, for the most part.
You are correct about there not being a preexisting hypothesis, but
regression is rarely used for data mining. Usually there are too many
variables and a lot of categorical variables that make regression messy.
Also, regression is awkward at best at dealing with variable interactions.
Stepwise regression was an early approach to dealing with many variables.
I understand that traditional statisticians cringe at the thought of
throwing hundreds or thousands of variables into the hopper and letting some
machine decide what is significant. But in the world of commercial
statistics, the goal is to develop _some_ model that will produce measurable
gain over using all data. Validation and pruning of these models is
essential to avoid overfitting and the loss of generalization.
I believe you have a background in psychology: you could consider the
development of the MMPI psychological assessment as an example of data
mining. They gave hundreds (or maybe even thousands) of questions to
psychiatric patients and built regression models to predict various
conditions. They did not have a hypothesis in advance as to which questions
would be significant predictors. I don't know if they have switched to a
more sophisticated model, but I have no doubt that modern modeling methods
could do a better job than the original regression-based approach that they
used.
Machine learning, in my impression, might be applied to
some data-mining, but it includes 'artificial intelligence'
and a whole lot that is different from data-mining, and
that uses a variety of non-linear algorithms.
Yes, that is correct. That is why I said data mining was an application of
machine learning. Machine learning could be used for matching proteins,
predicting heart attacks, driving robots, etc.
What makes these "machine learning"? Is that term regularly used?
Yes, the term "machine learning" is used regularly. It is equivalent to
"model building," but it sounds more sophisticated. It is probably most
appropriately used to describe the iterative process where weights and
thresholds are assigned to nodes in a neural network to optimize the
predicted value, but it also describes the process of building a decision
tree, support vector machine, etc. You might ask the question, what is the
difference between an iterative process for weighting nodes in a neural
network and computing parameters for a (non)linear regression equation. The
real answer is that there is little or no difference, but the term
"learning" is virtually never applied to regression.
--
Phil Sherrod
(phil.sherrod 'at' sandh.com)
http://www.dtreg.com (decision tree and SVM predictive modeling)
http://www.nlreg.com (nonlinear regression)
.
- Follow-Ups:
- Re: Difference between Data Mining and Machine Learning
- From: Graham Jones
- Re: Difference between Data Mining and Machine Learning
- References:
- Difference between Data Mining and Machine Learning
- From: Nomen Nescio
- Re: Difference between Data Mining and Machine Learning
- From: Phil Sherrod
- Re: Difference between Data Mining and Machine Learning
- From: Richard Ulrich
- Difference between Data Mining and Machine Learning
- Prev by Date: Re: Difference between Data Mining and Machine Learning
- Next by Date: Identify lag w missing data?
- Previous by thread: Re: Difference between Data Mining and Machine Learning
- Next by thread: Re: Difference between Data Mining and Machine Learning
- Index(es):
Relevant Pages
|