[R] cluster analysis in R

Hennig, Christian c.hennig at ucl.ac.uk
Fri Nov 16 12:03:53 CET 2012


Dear Katherine,

function flexmixedruns in package fpc may do what you want; it fits mixtures with continuous and categorical variables, can use the BIC for giving you the number of mixture components and also gives you posterior probabilities for cases to belong to components.

Note that generally finding the right cluster analysis method is a complicated task and depends crucially on your application, what use you want to make of the clusters etc., so what's best cannot be conclusively said on a mailing list. The same holds for whether and how to select variables. Certainly it's not wrong in general to use all the variables that you have but whether it's better otherwise depends on what meaning your variables have and how this relates to the aim of clustering, what to do with the variables afterwards etc.

You may have a look at 
http://www.rss.org.uk/site/cms/contentviewarticle.asp?article=866#Link%20to%20Nov.%202012%20paper
where I discuss a number of related issues.

Best regards,
Christian


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
c.hennig at ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of KitKat [katherinewright at trentu.ca]
Sent: 15 November 2012 18:14
To: r-help at r-project.org
Subject: [R] cluster analysis in R

I have two issues.

1-I am trying to use morphology to identify gender. I have 9 variables, both
continuous and categorical. I was using two-step cluster analysis in SPSS
because two-step could deal with different types of variables. But the
output tells me that an animal is in cluster 1 or 2, it does not give me a
probability (ex. 0.70 cluster 2).  I also did not want to specify that I
want two clusters, I wanted to see if analysis would naturally give me two
clusters. These were all advantages to using SPSS but now I'm having
trouble.

Does cluster analysis in R give probabilities?
Which type of cluster analysis in R is best to use? I did not think
hierarchical analysis was a great choice, but maybe I'm wrong. I don't want
to create the average variable, I want the analysis to do it on its own.
I'm also new to R so would have to figure out the right codes to enter, etc.

2-I was also told to analyze each variable on its own before including it in
cluster analysis. I had first included them all then teased out which ones
were not important, but now have been asked to do the reverse. I cannot do
cluster analysis on one variable -for example, one variable is either
present or absent on an individual so of course cluster analysis gives me
two clusters, one representing present and one representing absent. I was
told to use regression, but how can regression also not give the same
result? I feel like it would give me a line connecting a bunch of 0s to 1s.
I don't know what to use, or if I can analyze each variable like this before
putting them into cluster analysis. I ultimately want to only use the
smallest number of variables necessary to identify gender.

I have tried reading manuals etc and talking to people at my school, but
nothing has helped. If anyone has any insight, that would be much
appreciated
Thank you!



--
View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list