[R] Manova
Göran Broström
gb at stat.umu.se
Thu Dec 18 15:05:49 CET 2003
On Thu, Dec 18, 2003 at 12:04:31PM +0100, Peter Dalgaard wrote:
> Göran Broström <gb at stat.umu.se> writes:
>
> > Dear R-helpers,
> >
> > In a data set I got from a medical doctor there are six treatment groups
> > and (about) 5 bivariate responses in each group. Using 'manova', it is
> > easy to see significant differences in treatment effects, but the doctor
> > is more interested in the correlation between the two responses (within
> > groups). I'm willing to assume a common value over groups, and one way
> > of estimating and testing the common correlation would be to use
> > 'cor.test' on the residuals from 'manova', but I guess that the
> > resulting p-value (from testing zero correlation) will be far too
> > optimistic (it is in fact 4.5e-5).
>
> I think you're getting correlation right, but the DF wrong since 5 DF
> are lost to treatment contrasts. Hence the t statistic is wrong too
> (wrong DF *and* inflated by about 20%).
> > What is the 'right' way of doing this in R?
>
> Is there one?
>
> One expedient way is to look for a zero regression coef in
>
> summary(lm(v1~treat+v2)) # or vice versa
>
> or you could clone the calculation from getS3method("cor.test","default")
>
> r <- cor(x, y)
> df <- ??? # insert properly calculated DF here.
> STATISTIC <- c(t = sqrt(df) * r/sqrt(1 - r^2))
> p <- pt(STATISTIC, df)
Thanks Peter, that seems to be correct; zero correlation should be
equivalent to zero regression. I wrote a function, 'grouped.cor', based
on your second suggestion, and I get:
> grouped.cor(x, y, group)
$p.value
[1] 0.0002546125
...
compared to
> summary(lm(y ~ x + group))
...
x 0.51495 0.11725 4.392 0.000255 ***
quite close, right? Compare with the 'naive' value 0.000045.
Göran
grouped.cor <- function(x, y, group){
## Assumes a common correlation over groups, and that
## x and y have constant variance, but
## their means may differ between groups.
n <- length(group)
if ( (length(x) != n) || (length(y) != n)) stop("Length mismatch")
n.grp <- length(unique(group))
df <- n - n.grp - 1
if (df < 1) stop("Too many groups or too few observations")
subtract.mean <- function(x) x - mean(x)
x <- unlist(tapply(x, group, subtract.mean))
y <- unlist(tapply(y, group, subtract.mean))
# NOTE: This will (eventually) reorder x and y,
# but in the same way: doesn't affect cor(x, y)!
res <- cor.test(x, y)
statistic <- res$statistic * sqrt(df / (n - 2))
p <- pt(statistic, df)
p <- 2 * min(p, 1 - p)
list(p.value = p, statistic = statistic, df = df)
}
More information about the R-help
mailing list