[R] Manova

Thu Dec 18 15:05:49 CET 2003

On Thu, Dec 18, 2003 at 12:04:31PM +0100, Peter Dalgaard wrote:
> Göran Broström <gb at stat.umu.se> writes:
> 
> > Dear R-helpers,
> > 
> > In a data set I got from a medical doctor there are six treatment groups
> > and (about) 5 bivariate responses in each group. Using 'manova', it is
> > easy to see significant differences in treatment effects, but the doctor
> > is more interested in the correlation between the two responses (within
> > groups). I'm willing to assume a common value over groups, and one way
> > of estimating and testing the common correlation would be to use
> > 'cor.test' on the residuals from 'manova', but I guess that the
> > resulting p-value (from testing zero correlation) will be far too
> > optimistic (it is in fact 4.5e-5).
> 
> I think you're getting correlation right, but the DF wrong since 5 DF
> are lost to treatment contrasts. Hence the t statistic is wrong too
> (wrong DF *and* inflated by about 20%). 
> > What is the 'right' way of doing this in R?
> 
> Is there one? 
> 
> One expedient way is to look for a zero regression coef in
> 
> summary(lm(v1~treat+v2)) # or vice versa
> 
> or you could clone the calculation from getS3method("cor.test","default")
> 
>         r <- cor(x, y)
>         df <- ??? # insert properly calculated DF here.
>         STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2))
>         p <- pt(STATISTIC, df)

Thanks Peter, that seems to be correct; zero correlation should be
equivalent to zero regression. I wrote a function, 'grouped.cor', based
on your second suggestion, and I get:

> grouped.cor(x, y, group)
$p.value
[1] 0.0002546125
...
compared to

> summary(lm(y ~ x + group))
...
x            0.51495    0.11725   4.392 0.000255 ***

quite close, right? Compare with the 'naive' value 0.000045.

Göran

grouped.cor <- function(x, y, group){
    ## Assumes a common correlation over groups, and that
    ## x and y have constant variance, but
    ## their means may differ between groups.

    n <- length(group)
    if ( (length(x) != n) || (length(y) != n)) stop("Length mismatch")
    n.grp <- length(unique(group))
    df <- n - n.grp - 1
    if (df < 1) stop("Too many groups or too few observations")

    subtract.mean <- function(x) x - mean(x)

    x <- unlist(tapply(x, group, subtract.mean))
    y <- unlist(tapply(y, group, subtract.mean))
    # NOTE: This will (eventually) reorder  x  and  y,
    # but in the same way: doesn't affect cor(x, y)!

    res <- cor.test(x, y)
    statistic <- res$statistic * sqrt(df / (n - 2))
    p <- pt(statistic, df)
    p <- 2 * min(p, 1 - p)

    list(p.value = p, statistic = statistic, df = df)
}