[R] Help with Mahalanobis
Jose Claudio Faria
joseclaudio.faria at terra.com.br
Fri Jul 8 19:57:15 CEST 2005
Dear R list,
I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data
as obtained below:
Squared Distance to Species From Species:
Setosa Versicolor Virginica
Setosa 0 89.86419 179.38471
Versicolor 89.86419 0 17.20107
Virginica 179.38471 17.20107 0
These distances were obtained with proc 'CANDISC' of SAS, please,
see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from
http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap21/sect19.htm
From these distances my intention is to make a cluster analysis as below, using
the package 'mclust':
In prior mail, my basic question was: how to obtain this matrix with R
from 'iris' data?
Well, I think that the basic soluction to calculate this distances is:
#
# --- Begin R script 1 ---
#
x = as.matrix(iris[,1:4])
tra = iris[,5]
man = manova(x ~ tra)
# Mahalanobis
E = summary(man)$SS[2] #Matrix E
S = as.matrix(E$Residuals)/man$df.residual
InvS = solve(S)
ms = matrix(unlist(by(x, tra, mean)), byrow=T, ncol=ncol(x))
colnames(ms) = names(iris[1:4])
rownames(ms) = c('Set', 'Ver', 'Vir')
D2.12 = (ms[1,] - ms[2,])%*%InvS%*%(ms[1,] - ms[2,])
print(D2.12)
D2.13 = (ms[1,] - ms[3,])%*%InvS%*%(ms[1,] - ms[3,])
print(D2.13)
D2.23 = (ms[2,] - ms[3,])%*%InvS%*%(ms[2,] - ms[3,])
print(D2.23)
#
# --- End R script 1 ---
#
Well, I would like to generalize a soluction to obtain
the matrices like 'Mah' (below) or a complete matrix like in the
Output 21.1.2. Somebody could help me?
#
# --- Begin R script 2 ---
#
Mah = c( 0,
89.86419, 0,
179.38471, 17.20107, 0)
n = 3
D = matrix(0, n, n)
nam = c('Set', 'Ver', 'Vir')
rownames(D) = nam
colnames(D) = nam
k = 0
for (i in 1:n) {
for (j in 1:i) {
k = k+1
D[i,j] = Mah[k]
D[j,i] = Mah[k]
}
}
D=sqrt(D) #D2 -> D
library(mclust)
dendroS = hclust(as.dist(D), method='single')
dendroC = hclust(as.dist(D), method='complete')
win.graph(w = 3.5, h = 6)
split.screen(c(2, 1))
screen(1)
plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue')
screen(2)
plot(dendroC, main='Complete', sub='', xlab='', col='red')
#
# --- End R script 2 ---
#
I always need of this type of analysis and I'm not founding how to make it in
the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva).
Regards,
--
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
joseclaudio.faria at terra.com.br
More information about the R-help
mailing list