[R] calculating dissimilarities in R

Martin Maechler maechler at stat.math.ethz.ch
Tue Sep 26 09:55:50 CEST 2006

Hi Elvina,

>>>>> "Elvina" == Elvina Payet <virgin at seychelles.sc>
>>>>>     on Tue, 26 Sep 2006 05:48:01 GMT writes:

    Elvina> ,A (BDear All,
    Elvina> I’ve got a statistical question on calculating
    Elvina> dissimilarities in R.
    Elvina> I want to calculate the different types of dissimilarities
    Elvina> on the ‘flower’ dataset found in the package
    Elvina> ‘cluster’. Flower is a data frame with 18 observations
    Elvina> on 8 variables. Variable 1 and 2 are binary, variable 3 is
    Elvina> asymmetric binary, variable 4 is nominal, variable 5 and 6
    Elvina> are ordered and variable 7 and 8 are interval scaled.

    Elvina> Commands to load the dataset in R.

      > library(cluster)
      > data(flower)

or  data(flower, package = "cluster")

    Elvina> What are the different types of dissimilarities that can be
    Elvina> calculated on such a dataset?  
    Elvina> Do I need to group the types of variables first i.e. all
    Elvina> binary together then run the calculation?  Do I use
    Elvina> dissimilarity indices such as Jaccard or should it be
    Elvina> classification function such as ‘daisy’ which should be
    Elvina> used? 

Yes, you should use  daisy() to calculate dissimilarities,
particularly when you are interested in the difference between
symmetric and asymmetric binary.

Do read  help(daisy)  and look at its examples.

Maybe this will answer all your questions or then it will help
you to ask a much more specific question as suggested by the
posting guide (see link below!)


    virgin> ______________________________________________

    virgin> PLEASE do read the posting guide

    virgin> http://www.R-project.org/posting-guide.html 

    virgin> and provide commented, minimal, self-contained, reproducible code.

Martin Maechler, ETH Zurich

More information about the R-help mailing list