[R] algorithm for clustering categorical data
Li, Yan
Yan_Li at ibi.com
Thu Aug 1 18:11:41 CEST 2013
Great! Thanks!
Yeah, I just use the usual way: as.numeric(..) for numeric transformation...seemed a standardization is needed. Thank you.
-----Original Message-----
From: David Carlson [mailto:dcarlson at tamu.edu]
Sent: Thursday, August 01, 2013 12:08 PM
To: Li, Yan; r-help at r-project.org
Subject: RE: [R] algorithm for clustering categorical data
Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You didn't give us any information about how you numerically transformed the categorical variables, but the usual approach is to create indicator variables that code presence/absence for each category within a categorical variable. Different variances between variables can be reduced by standardizing the variables.
-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help at r-project.org
Subject: [R] algorithm for clustering categorical data
Hi All,
Does anyone know what algorithm for clustering categorical variables? R packages? Which is the best?
If a data has both numeric and categorical data, what is the best clustering algorithm to use and R package?
I tried numeric transformation of all categorical fields and doing clustering afterwards. But the transformed fields have values from 1...10, and my other fields is in a bigger scale:
10000-...This will make the categorical fields has less effect on the distance calculation...
Thank you!
Yan
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list