[R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject
Ranjan Maitra
maitra.mbox.ignored at inbox.com
Tue Oct 2 19:59:51 CEST 2012
John,
On Tue, 2 Oct 2012 11:35:12 -0400 John Sorkin
<jsorkin at grecc.umaryland.edu> wrote:
> Window XP
> R 2.15
>
> I am running a cluster analysis in which I ask for three clusters (see code below). The analysis nicely tells me what cluster each of the subjects in my input dataset belongs to. I would like two pieces of information
> (1) for every subject in my input data set, what is the probability of the subject belonging to each of the three cluster
K-means provides hard clustering, whatever cluster has closest mean
gets the assignment.
> (2) given a new subject, someone who was not in my original dataset, how can I determine their cluster assignment?
Look at the distance between the subject the cluster means: the one
that is closest gets assigned the cluster.
If you are looking for probabilistic clustering (under Gaussian
mixture model assumptions), you could use model-based clustering: one R
package is mclust.
Btw, note that kmeans is very sensitive to initialization (as is
mclust): you may want to try several random starts (for kmeans),
at the very least. Use the argument "nstart" with a huge number.
HTH,
Ranjan
> Thanks,
> John
>
> # K-Means Cluster Analysis
> jclusters <- 3
> fit <- kmeans(datascaled, jclusters) # 3 cluster solution
>
> and fit$cluster tells me what cluster each observation in my input dataset belongs to (output truncated for brevity):
>
> > fit$cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 . . . .
> 1 1 1 1 3 1 1 1 1 2 1 2 1 1 1 1 1 . . . . How do I get probability of being in cluster 1, cluster 2, and cluster 3 for a given subject, e.g datascaled[1,]?How do I get the cluster assigment for a new subject?Thanks,John
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:16}}
More information about the R-help
mailing list