[R] Pooled Covariance Matrix

Murray Jorgensen maj at waikato.ac.nz
Wed Sep 20 23:13:38 CEST 2006

```Thank you, Professor Ripley.  Murray Jorgensen

Prof Brian Ripley wrote:
> On Wed, 20 Sep 2006, Murray Jorgensen wrote:
>
>> I am in a discriminant analysis situation with a frame containing
>> several variables and a grouping factor, if you like:
>>
>> set.seed(200906)
>> exampledf <- as.data.frame(matrix(rnorm(50,5,2),nrow=10,ncol=5))
>> exampledf\$Group <- factor(rep(c(1,2,3),c(3,3,4)))
>> exampledf
>>
>> I'm sure there must be a simple way to get the within group pooled
>> covariance matrix but I haven't found it yet.
>
> There are two versions of this, weighted and unweighted, and the
> difference caused confusion in the early discriminant analysis
> literature. (See MASS4 p.333.)  The weighted version is conventional.
>
> Suppose you have a matrix X and a grouping factor g.  Then either of
>
>    group.means <- rowsum(X, g)/as.vector(table(g))
>    group.means <- tapply(X, list(rep(g, ncol(X)), col(X)), mean)
>
> gives the group means, and var(X - group.means[g,]) seems to be what you
> want.
>
>> I started thinking that one might begin by forming a frame with the same
>>  dimensions but containing the group means. But then I found a thread
>> from two years back called "Getting the groupmean for each person" which
>> seemed to imply that doing this was a bit subtle even for ncol=1. Hence
>> I will risk a question to the list.
>
> That thread seems to be about efficiency for very large matrices on R of
> two years' ago.
>

--
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441    Mobile 021 1395 862

```