[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution

Mike White mikewhite.diu at btconnect.com
Tue Feb 20 12:18:21 CET 2007


I want to calculate the probability that a group will include a particular
point using the squared Mahalanobis distance to the centroid. I understand
that the squared Mahalanobis distance is distributed as chi-squared but that
for a small number of random samples from a multivariate normal population
the Hotellings T2 (T squared) distribution should be used.
I cannot find a function for Hotelling's T2 distribution in R (although from
a previous post I have been provided with functions for the Hotelling Test).
My understanding is that the Hotelling's T2 distribution is related to the F
distribution using the equation:
                             T2(u,v) = F(u, v-u+1)*vu/(v-u+1)
where u is the number of variables and v the number of group members.

I have written the R code below to compare the results from the chi-squared
distribution with the Hotelling's T2 distribution for probability of a
member being included within a group.
Please can anyone confirm whether or not this is the correct way to use
Hotelling's T2 distribution for probability of group membership. Also, when
testing a particular group member, is it preferable to leave that member out
when calculating the centre and covariance of the group for the Mahalanobis
distances?

Thanks
Mike White

############################################################################
####
## Hotelling T^2 distribution function
ph<-function(q, u, v, ...){
# q vector of quantiles as in function pf
# u number of independent variables
# v number of observations
if (!v > u+1) stop("n must be greater than p+1")
df1 <- u
df2 <- v-u+1
pf(q*df2/(v*u), df1, df2, ...)
}

# compare Chi-squared and Hotelling T^2 distributions for a group member
u<-3
v<-10
set.seed(1)
mat<-matrix(rnorm(v*u), nrow=v, ncol=u)
MD2<-mahalanobis(mat, center=colMeans(mat), cov=cov(mat))
d<-MD2[order(MD2)]
# select a point midway between nearest and furthest from centroid
dm<-d[length(d)/2]
1-ph(dm,u,v)    # probability using Hotelling T^2 distribution
# [1] 0.6577069
1-pchisq(dm, u) # probability using Chi-squared distribution
# [1] 0.5538466



More information about the R-help mailing list