[R] clustering

Fri Jan 28 17:01:15 CET 2005

Hi,
I think "truncated" normality is what I meant in the last email. The
experiments (, which might not be representative) show k-mean and EM
gave me comparable results, while EM is a little bit better. (I used
Weka for this purpose). I will try them on the real data and also try
pam() and if I have clear conclusion, I will post my result as a
future suggestion.

Thanks for all you guys' help!

Ed

On Fri, 28 Jan 2005 11:51:44 +0100 (MET), Christian Hennig
<fm3a004 at math.uni-hamburg.de> wrote:
> Hi,
> 
> EMclust in package mclust fits normal mixtures.
> Note that if you split your data values into intervals, the resulting
> distributions conditional on the intervals are not normals, but truncated
> normals!
> This is important if you try to check within group normality, unless you
> have strongly separated clusters (which does not seem to be the case).
> 
> Christian
> 
> 
> On Fri, 28 Jan 2005, WeiWei Shi wrote:
> 
> > Actually the problem I am trying to solve is to discretize a
> > continuous variable (which is my response variable (dependent
> > variable) in my project so that I can make a regression problem into a
> > classification one. (There are many reasons for doing this.)
> >
> > Since there is no class label for this variable (because this variable
> > is my class variable :), the unsupervised approach can be applied
> > here. However, checking the related papers shows there is little
> > research (in my knowledge, and I haven't checked the MCC yet) in this
> > field. Using qqnorm to check the normality and histogram indicates
> > there might be two normal distributions.
> >
> > My approach is splitting the values for this variable into 2 or 3
> > intervals and check each interval's normality again. If some approach
> > like clustering or the one Andy suggests works well, then I should get
> > much better normality. I will try that tomorrow.
> >
> > I am not sure if my idea works or not here, please be advised !
> >
> > Thanks,
> >
> > Ed
> >
> >
> > On Thu, 27 Jan 2005 18:58:28 -0500, Liaw, Andy <andy_liaw at merck.com> wrote:
> > > It depends a lot on what you know or don't know about the data, and what
> > > problem you're trying to solve.
> > >
> > > If you know for sure it's a mixture of gaussians, likelihood based
> > > approaches might be better.  MASS (the book) has an example of fitting
> > > univariate mixture of gaussians using various optimizers.  The code is even
> > > in $R_HOME/library/MASS/scripts/ch16.R.
> > >
> > > Andy
> > >
> > > > From: WeiWei Shi
> > > >
> > > > Hi,
> > > > thanks for reply. In fact, I tried both of them and I also tried the
> > > > other method and I found all of them gave me different boundaries (to
> > > > my real datasets). I am thinking about k-median but hoping to get more
> > > > suggestions from all of you in this forum.
> > > >
> > > > Cheers,
> > > >
> > > > Ed
> > > >
> > > >
> > > > On Thu, 27 Jan 2005 15:37:16 -0600, msck9 at mizzou.edu
> > > > <msck9 at mizzou.edu> wrote:
> > > > > The cluster analysis should be able to handle that. I think if you
> > > > > know how many clusters you have, "kmeans" is ok, or the EM algorithm
> > > > > can also do that.
> > > > > On Thu, Jan 27, 2005 at 03:44:42PM -0500, WeiWei Shi wrote:
> > > > > > Hi,
> > > > > > I just get a question (sorry if it is a dumb one) and I "phase" my
> > > > > > question in the following R codes:
> > > > > >
> > > > > > group1<-rnorm(n=50, mean=0, sd=1)
> > > > > > group2<-rnorm(n=20, mean=1, sd=1.5)
> > > > > > group3<-c(group1,group2)
> > > > > >
> > > > > >
> > > > > > Now, if I am given a dataset from group3, what method
> > > > (discriminant
> > > > > > analysis, clustering, maybe) is the best to cluster them
> > > > by using R.
> > > > > > The known info includes: 2 clusters, normal distribution (but the
> > > > > > parameters are unknown).
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Ed
> > > > > >
> > > > > > ______________________________________________
> > > > > > R-help at stat.math.ethz.ch mailing list
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > PLEASE do read the posting guide!
> > > > http://www.R-project.org/posting-guide.html
> > > > >
> > > >
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide!
> > > > http://www.R-project.org/posting-guide.html
> > > >
> > > >
> > >
> > >
> > > ------------------------------------------------------------------------------
> > > Notice:  This e-mail message, together with any attachment...{{dropped}}
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
> 
> ***********************************************************************
> Christian Hennig
> Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
> hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
> #######################################################################
> ich empfehle www.boag-online.de
> 
>