[R] Help with Mahalanobis

Christian Hennig chrish at stats.ucl.ac.uk
Fri Jul 8 20:14:03 CEST 2005


Dear Jose,

normal mixture clustering (mclust) operates on points times variables data
and not on a distance matrix. Therefore
it doesn't make sense to compute Mahalanobis distances before using
mclust.
Furthermore, cluster analysis based on distance matrices (hclust or pam,
say) operates on a point by point distance matrix (be it Mahalanobis or
something else). You show a group by group matrix below, for which I don't
see any purpose in cluster analysis.
Have you looked at function mahalanobis?

Christian


On Fri, 8 Jul 2005, Jose Claudio Faria wrote:

> Dear R list,
>
> I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data
> as obtained below:
>
> Squared Distance to Species From Species:
>
>                Setosa Versicolor Virginica
> Setosa 	           0   89.86419 179.38471
> Versicolor  89.86419          0  17.20107
> Virginica  179.38471   17.20107         0
>
> These distances were obtained with proc 'CANDISC' of SAS, please,
> see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from
> http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap21/sect19.htm
>
>  From these distances my intention is to make a cluster analysis as below, using
> the package 'mclust':
>
> In prior mail, my basic question was: how to obtain this matrix with R
> from 'iris' data?
>
> Well, I think that the basic soluction to calculate this distances is:
>
> #
> # --- Begin R script 1 ---
> #
> x   = as.matrix(iris[,1:4])
> tra = iris[,5]
>
> man = manova(x ~ tra)
>
> # Mahalanobis
> E    = summary(man)$SS[2] #Matrix E
> S    = as.matrix(E$Residuals)/man$df.residual
> InvS = solve(S)
> ms = matrix(unlist(by(x, tra, mean)), byrow=T, ncol=ncol(x))
> colnames(ms) = names(iris[1:4])
> rownames(ms) = c('Set', 'Ver', 'Vir')
> D2.12 = (ms[1,] - ms[2,])%*%InvS%*%(ms[1,] - ms[2,])
> print(D2.12)
> D2.13 = (ms[1,] - ms[3,])%*%InvS%*%(ms[1,] - ms[3,])
> print(D2.13)
> D2.23 = (ms[2,] - ms[3,])%*%InvS%*%(ms[2,] - ms[3,])
> print(D2.23)
> #
> # --- End R script 1 ---
> #
>
> Well, I would like to generalize a soluction to obtain
> the matrices like 'Mah' (below) or a complete matrix like in the
> Output 21.1.2. Somebody could help me?
>
> #
> # --- Begin R script 2 ---
> #
>
> Mah = c(        0,
>           89.86419,        0,
>          179.38471, 17.20107, 0)
>
> n = 3
> D = matrix(0, n, n)
>
> nam = c('Set', 'Ver', 'Vir')
> rownames(D) = nam
> colnames(D) = nam
>
> k = 0
> for (i in 1:n) {
>     for (j in 1:i) {
>        k      = k+1
>        D[i,j] = Mah[k]
>        D[j,i] = Mah[k]
>     }
> }
>
> D=sqrt(D) #D2 -> D
>
> library(mclust)
> dendroS = hclust(as.dist(D), method='single')
> dendroC = hclust(as.dist(D), method='complete')
>
> win.graph(w = 3.5, h = 6)
> split.screen(c(2, 1))
> screen(1)
> plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue')
>
> screen(2)
> plot(dendroC, main='Complete', sub='', xlab='', col='red')
> #
> # --- End R script 2 ---
> #
>
> I always need of this type of analysis and I'm not founding how to make it in
> the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva).
>
> Regards,
> --
> Jose Claudio Faria
> Brasil/Bahia/UESC/DCET
> Estatistica Experimental/Prof. Adjunto
> mails:
>   joseclaudio.faria at terra.com.br
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

*** NEW ADDRESS! ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche




More information about the R-help mailing list