Re: Difference between Data Mining and Machine Learning
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Tue, 21 Feb 2006 15:47:00 -0500
- I don't know much about either subject, so I have questions -
On Mon, 20 Feb 2006 15:56:47 GMT, "Phil Sherrod"
<phil.sherrod@xxxxxxxxxxxxxxxxxxx> wrote:
On 20-Feb-2006, Nomen Nescio <nobody@xxxxxxxxx> wrote:
What is the difference between data mining and machine learning?
Data mining is an application of machine learning.
In data mining you use a machine learning technique such as decision trees,
neural networks, support vector machine, etc. to attempt to build a
predictive model for some target variable of interest using multiple
potential predictor variables. The "mining" part comes from considering many
potential predictors without necessarily knowing which ones may end up being
significant.
"Data mining" started out as a hostile barb for the habit
of "looking at too many variables" -- before it was adopted
as a commercial name. I was surprised, frankly, that someone
asked for the difference between "data mining" and "machine
learning". Is that the nature of the commercial data-miners?
I still use data-mining as a term for, mainly, "many variables
without hypotheses that are necessarily clear." I've thought
of it as creating linear regressions, for the most part.
Machine learning, in my impression, might be applied to
some data-mining, but it includes 'artificial intelligence'
and a whole lot that is different from data-mining, and
that uses a variety of non-linear algorithms.
For example, I have customers doing commercial data mining for marketing
analysis who build models with over 600 potential predictor variables. Out
of the 600 potential predictors only 50 may end up being significant, but it
may be a different set of 50 for different analyses. Good, commercial
machine learning programs can accept hundreds, thousands (and in some cases
even tens of thousands) of potential predictor variables, analyze them and
then develop a model using the significant predictors.
What makes these "machine learning"? Is that term
regularly used?
With this large a pool of potential predictors, it is essential to watch for
overfitting and perform validation using either an independent test data set
or more sophisticated methods such as cross validation or Out Of Bag (OOB)
data.
--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.
- Follow-Ups:
- Re: Difference between Data Mining and Machine Learning
- From: Phil Sherrod
- Re: Difference between Data Mining and Machine Learning
- References:
- Difference between Data Mining and Machine Learning
- From: Nomen Nescio
- Re: Difference between Data Mining and Machine Learning
- From: Phil Sherrod
- Difference between Data Mining and Machine Learning
- Prev by Date: Stata Monte Carlo problem with stacking coefficients into output vector
- Next by Date: Re: Difference between Data Mining and Machine Learning
- Previous by thread: Re: Difference between Data Mining and Machine Learning
- Next by thread: Re: Difference between Data Mining and Machine Learning
- Index(es):
Relevant Pages
|