[R] Help with Mahalanobis

Sat Jul 9 12:46:39 CEST 2005

Christian Hennig wrote:

> Dear Jose,
> 
> normal mixture clustering (mclust) operates on points times variables data
> and not on a distance matrix. Therefore
> it doesn't make sense to compute Mahalanobis distances before using
> mclust.
> Furthermore, cluster analysis based on distance matrices (hclust or pam,
> say) operates on a point by point distance matrix (be it Mahalanobis or
> something else). You show a group by group matrix below, for which I don't
> see any purpose in cluster analysis.
> Have you looked at function mahalanobis?
> 
> Christian

Dear Christian,

First of all, thanks for the reply!

So, multivariate analysis is not my field of domain, I'm studying this because
it is necessary in my works.

I'm using 'iris' only as an example of my real problem, because I normally work
with many response variables (5 or more), with replicates (10 or more) of many
groups (20 or more). In these cases, I think, the final dendogram using 'mclust'
package is not very good/clear.

I learned, in these cases, that the generalized distance of Mahalanobis,
obtained as in the prior example (see script), is one of the best choice to
study the similarity between the groups. Do you agree?

If yes, I need to cluster the objects from this matrix of distances between the
groups. My option by 'mclust' package was because I'm studying also it, no more,
and I think that, for the purpose, it works nice.

Could you help me about another (and simple) choice of analyze?

JCFaria

> On Fri, 8 Jul 2005, Jose Claudio Faria wrote:
> 
> 
>>Dear R list,
>>
>>I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data
>>as obtained below:
>>
>>Squared Distance to Species From Species:
>>
>>               Setosa Versicolor Virginica
>>Setosa 	           0   89.86419 179.38471
>>Versicolor  89.86419          0  17.20107
>>Virginica  179.38471   17.20107         0
>>
>>These distances were obtained with proc 'CANDISC' of SAS, please,
>>see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from
>>http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap21/sect19.htm
>>
>> From these distances my intention is to make a cluster analysis as below, using
>>the package 'mclust':
>>
>>In prior mail, my basic question was: how to obtain this matrix with R
>>from 'iris' data?
>>
>>Well, I think that the basic soluction to calculate this distances is:
>>
>>#
>># --- Begin R script 1 ---
>>#
>>x   = as.matrix(iris[,1:4])
>>tra = iris[,5]
>>
>>man = manova(x ~ tra)
>>
>># Mahalanobis
>>E    = summary(man)$SS[2] #Matrix E
>>S    = as.matrix(E$Residuals)/man$df.residual
>>InvS = solve(S)
>>ms = matrix(unlist(by(x, tra, mean)), byrow=T, ncol=ncol(x))
>>colnames(ms) = names(iris[1:4])
>>rownames(ms) = c('Set', 'Ver', 'Vir')
>>D2.12 = (ms[1,] - ms[2,])%*%InvS%*%(ms[1,] - ms[2,])
>>print(D2.12)
>>D2.13 = (ms[1,] - ms[3,])%*%InvS%*%(ms[1,] - ms[3,])
>>print(D2.13)
>>D2.23 = (ms[2,] - ms[3,])%*%InvS%*%(ms[2,] - ms[3,])
>>print(D2.23)
>>#
>># --- End R script 1 ---
>>#
>>
>>Well, I would like to generalize a soluction to obtain
>>the matrices like 'Mah' (below) or a complete matrix like in the
>>Output 21.1.2. Somebody could help me?
>>
>>#
>># --- Begin R script 2 ---
>>#
>>
>>Mah = c(        0,
>>          89.86419,        0,
>>         179.38471, 17.20107, 0)
>>
>>n = 3
>>D = matrix(0, n, n)
>>
>>nam = c('Set', 'Ver', 'Vir')
>>rownames(D) = nam
>>colnames(D) = nam
>>
>>k = 0
>>for (i in 1:n) {
>>    for (j in 1:i) {
>>       k      = k+1
>>       D[i,j] = Mah[k]
>>       D[j,i] = Mah[k]
>>    }
>>}
>>
>>D=sqrt(D) #D2 -> D
>>
>>library(mclust)
>>dendroS = hclust(as.dist(D), method='single')
>>dendroC = hclust(as.dist(D), method='complete')
>>
>>win.graph(w = 3.5, h = 6)
>>split.screen(c(2, 1))
>>screen(1)
>>plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue')
>>
>>screen(2)
>>plot(dendroC, main='Complete', sub='', xlab='', col='red')
>>#
>># --- End R script 2 ---
>>#
>>
>>I always need of this type of analysis and I'm not founding how to make it in
>>the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva).
>>
>>Regards,
>>--
>>Jose Claudio Faria
>>Brasil/Bahia/UESC/DCET
>>Estatistica Experimental/Prof. Adjunto
>>mails:
>>  joseclaudio.faria at terra.com.br
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
> 
> 
> *** NEW ADDRESS! ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
> 
> Esta mensagem foi verificada pelo E-mail Protegido Terra.
> Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - Dat 4531
> Proteja o seu e-mail Terra: http://mail.terra.com.br/
> 
> 

-- 
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
  joseclaudio.faria at terra.com.br
  jc_faria at uesc.br
  jc_faria at uol.com.br
tel: 73-3634.2779