Learning & Clssification shareware for problem modeled as BAG of ATTRIBUTES with ZERO or MULTIPLE REAL VALUES ?
- From: "Dr. Colombes" <DrColombes@xxxxxxxxx>
- Date: Sun, 23 Sep 2007 09:21:00 GMT
WEKA apparently does not have an algorithm for the following Learning
& Classification problem?
(1) Data set objects are modeled as a BAG (i.e., unordered
collection) of HUNDREDS of possible ATTRIBUTES (which, I think, will
need to be pruned to 10 - 30 attributes by a feature selection method
such as entropy reduction); and
(2) ZERO or MULTIPLE (dozens) possible REAL-VALUED observations for
each attribute.
For example, one instance of our Learning & Classification problem may
have the following data set format:
{object 1}
[variable 1: (0.23, 1.49, 3,18, 0.79);
variable 2: (no observations);
variable 3: (1.77)
variable 4: (no observations)
....
variable V: (5.23, 2.94)]
{object 2}
[variable 1: (no observations);
variable 2: (3.25, 2.77);
variable 3: (1.77)
variable 4: (0.78, 4.35, 2.51)
....
variable V: (no observations)]
.....
and
{object O}
[variable 1: (2.74);
variable 2: (0.85, 0.74, 1.43, 1.13, 3.29);
variable 3: (no observations)
variable 4: (3.36, 2.91)
....
variable V: (no observations)]
The number of variables V is approximately 100 - 200:
[100 < V < 200].
The number of data set observations O is approximately one million:
O ~ 1,000,000
Does anyone know of some shareware that can be applied to the above
Learning & Classification problem?
Thanks for your suggestions.
Note: The DISCRETE-valued version of this problem would be the BAG of
KEYWORDS model used by Nigam and McCallum for (Naive Bayes) Text
Classification ("A Comparison of Event Models [Multi-variate Bernoulli
and Multinomial] ... "). Possible KEYWORD ATTRIBUTE values are {0,
1} (for the multi-variate Bernoulli model) and {0, 1, 2, 3, ...} for
the multi-nomial model.
[ comp.ai is moderated ... your article may take a while to appear. ]
.
- Follow-Ups:
- Prev by Date: request for pointers: distributed adaptation, principles of organization and control
- Next by Date: ANN: Predictive Analytics for Business, Marketing and Web - training event
- Previous by thread: request for pointers: distributed adaptation, principles of organization and control
- Next by thread: Re: Learning & Clssification shareware for problem modeled as BAG of ATTRIBUTES with ZERO or MULTIPLE REAL VALUES ?
- Index(es):