Re: clustering criterion
- From: Ted Dunning <ted.dunning@xxxxxxxxx>
- Date: Wed, 17 Oct 2007 23:49:38 GMT
With web sessions, the best proximity data you can get (in my opinion)
is derived by building user or session models and then seeing how well
the model predicts other sessions. Depending on the cardinality of
your event set, you may need to use a latent variable model to deal
with sparsity. If you have only a few (hundreds, say) pages that all
get pretty good traffic then you may be able to model their visits
explicitly. There are many forms of latent models possible, but the
latent Dirichlet work on hidden Markov models for text clustering
might be of particular interest to you.
thanks. actually my dataset is web sessions. i computed a similarity
matrix between sessions. since i don't know the label of sessions
priorly, i'm kind of confused about the criterion. by the way, does
anyone know good algorithm for this kind of matrix clustering?
thanks in advance.
[ comp.ai is moderated ... your article may take a while to appear. ]
.
- Follow-Ups:
- Re: clustering criterion
- From: jackie
- Re: clustering criterion
- References:
- clustering criterion
- From: jackie
- Re: clustering criterion
- From: jackie
- clustering criterion
- Prev by Date: AI structure
- Next by Date: CFP: reminder: SMART'08 - TWO WEEKS to Submission Deadline - November 2nd, 2007
- Previous by thread: Re: clustering criterion
- Next by thread: Re: clustering criterion
- Index(es):