[R] Cluster analysis, factor variables, large data set

Hans Ekbrand hans at sociologi.cjb.net
Thu Mar 31 20:48:02 CEST 2011


On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote:
> Dear Hans,
> 
> clara doesn't require a distance matrix as input (and therefore
> doesn't require you to run daisy), it will work with the raw data
> matrix using
> Euclidean distances implicitly.
> I can't tell you whether Euclidean distances are appropriate in this
> situation (this depends on the interpretation and variables and
> particularly on how they are scaled), but they may be fine at least
> after some transformation and standardisation of your variables.

The variables are unordered factors, stored as integers 1:9, where 

1 means "Full-time employment"
2 means "Part-time employment"
3 means "Student"
4 means "Full-time self-employee"
...

Does euclidean distances make sense on unordered factors coded as
integers?



More information about the R-help mailing list