[R] model-based clustering

Murad Nayal mn216 at columbia.edu
Wed Jan 14 22:00:40 CET 2004


Hello Murray,

thanks for the response. I would actually love to hear alternative
suggestions about the problem I am trying to solve. I just thought a
short question will be less of a burden on people's time and have a
higher chance of being answered.

basically the data sets I need to analyze contain 2000-10000 objects.
each characterized by, depending on the data set, 9-20 attributes. all
integers greater than zero, typically the range is [0,1000] with numbers
< 5 particularly common. there is no apriori reason why these objects
should cluster into discrete groups. and in fact when the data is
explored graphically (xgobi) it doesn't show an obvious clustering
pattern. however, with 9-20 dimensions involved, it is probably easy to
miss subtle patterns. I have tried clustering the data using a number of
standard approaches including hclust,kmeans,fanny etc. but these methods
didn't seem to be able to generate convincingly distinct, homogeneous
clusters. of course given the type of the data involved Poisson mixtures
seem like the natural choice.

I have experimented a bit with snob using contrived data sets (where you
know which class objects really belong to) and it has been fairly
promising, except maybe for snob's tendency to break the known classes
into multiple subclasses. 

I actually would like to try to code this in R. It would be very helpful
to me in fact if you can contribute any code/code fragments/examples
from your earlier work on this, either to the list or privately.

many thanks
Murad



maj at stats.waikato.ac.nz wrote:
> 
> The list could probably be more useful if you gave more details about your
> data and the problem. I have written a bit of R code myself for fitting a
> finite mixture of univariate Poissons by EM and found it very simple to
> program in R. I suspect that your problem is multivariate, but that should
> not present any difficulties.
> 
> The Snob program employs a fairly sophisticated model search strategy
> based on the Minimum Message Length criterion. If you do not know much
> about the solution that you are seeking it might be a good way to go. I
> appreciate that Snob can be rather complex to set up and get going but I
> think that you should be able to get quite a bit of help from the Monash
> University people behind the program. They are usually quite keen to
> encourage new users of Snob.
> 
> Murray Jorgensen
> 
> >
> > Hello,
> >
> > I was wondering whether a Poisson mixture modeler/cluster analysis
> > package is available for R. I scanned CRAN packages and couldn't find
> > anything but I thought I'd ask. If not could anyone recommend a non-R
> > open source package. I have found 'snob' but this program seems a bit
> > hard to use in an automated, non interactive fashion.
> >
> > regards,
> > Murad
> >
> >
> > --
> > Murad Nayal M.D. Ph.D.
> > Department of Biochemistry and Molecular Biophysics
> > College of Physicians and Surgeons of Columbia University
> > 630 West 168th Street. New York, NY 10032
> > Tel: 212-305-6884     Fax: 212-305-6926
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >

-- 
Murad Nayal M.D. Ph.D.
Department of Biochemistry and Molecular Biophysics
College of Physicians and Surgeons of Columbia University
630 West 168th Street. New York, NY 10032
Tel: 212-305-6884	Fax: 212-305-6926




More information about the R-help mailing list