[R] Pooled Covariance Matrix

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Sep 20 08:36:33 CEST 2006

On Wed, 20 Sep 2006, Murray Jorgensen wrote:

> I am in a discriminant analysis situation with a frame containing
> several variables and a grouping factor, if you like:
> set.seed(200906)
> exampledf <- as.data.frame(matrix(rnorm(50,5,2),nrow=10,ncol=5))
> exampledf$Group <- factor(rep(c(1,2,3),c(3,3,4)))
> exampledf
> I'm sure there must be a simple way to get the within group pooled
> covariance matrix but I haven't found it yet.

There are two versions of this, weighted and unweighted, and the 
difference caused confusion in the early discriminant analysis literature. 
(See MASS4 p.333.)  The weighted version is conventional.

Suppose you have a matrix X and a grouping factor g.  Then either of

    group.means <- rowsum(X, g)/as.vector(table(g))
    group.means <- tapply(X, list(rep(g, ncol(X)), col(X)), mean)

gives the group means, and var(X - group.means[g,]) seems to be what you 

> I started thinking that one might begin by forming a frame with the same
>  dimensions but containing the group means. But then I found a thread
> from two years back called "Getting the groupmean for each person" which
> seemed to imply that doing this was a bit subtle even for ncol=1. Hence
> I will risk a question to the list.

That thread seems to be about efficiency for very large matrices on R of 
two years' ago.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list