[R] Clustering Categorial and Continuous Variables

Thu Jun 10 13:26:36 CEST 2004

Hi!

You need a apropriate dissimilarity measure.
look for daisy in package cluster
help("daisy",package="cluster")

x: numeric matrix or data frame.  Dissimilarities will be
          computed between the rows of 'x'.  Columns of mode 'numeric'
          (i.e. all columns when 'x' is a matrix) will be recognized as
          interval scaled variables, columns of class 'factor' will be
          recognized as nominal variables, and columns of class
          'ordered' will be recognized as ordinal variables.  Other
          variable types should be specified with the 'type' argument. 
          Missing values ('NA's) are allowed. 
...

Fore example Gower 1971 proposed a coefficient for variables of different type(?) categorial continous binary.

sincerely 
Eryk

*********** REPLY SEPARATOR  ***********

On 6/10/2004 at 11:52 AM Wayne Jones wrote:

>>>Hi there fellow R users, 
>>>
>>>R has many different clustering packages (e.g. mclust,cluster,e1071).
>>>
>>>However, can anyone recommend a method to deal with data sets that
>>>contain
>>>categorial and continuous variables?
>>>
>>>Regards
>>>
>>>Wayne
>>>
>>>
>>>
>>>KSS Ltd
>>>Seventh Floor  St James's Buildings  79 Oxford Street  Manchester  M1
>>>6SS  England
>>>Company Registration Number 2800886
>>>Tel: +44 (0) 161 228 0040	Fax: +44 (0) 161 236 6305
>>>mailto:kssg at kssg.com		http://www.kssg.com
>>>
>>>
>>>The information in this Internet email is confidential and
>>>m...{{dropped}}
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Dipl. bio-chem. Eryk Witold Wolski    @    MPI-Moleculare Genetic   
Ihnestrasse 63-73 14195 Berlin       'v'    
tel: 0049-30-83875219               /   \    
mail: wolski at molgen.mpg.de        ---W-W----    http://www.molgen.mpg.de/~wolski