[R-sig-ME] Origins of difference in significance between lmer/pvals.fnc and aov?

Mon Sep 17 17:19:57 CEST 2012

Hi everyone,

I have a data set which seems amenable to analysis either through lmer
or aov, so I am comparing their results. I'm finding small p-values for
the fixed effect of interest in either case, but they differ quite a bit
(4E-5 vs 1E-3). Since I plan to perform the test across many hypotheses,
this difference will matter once multiple testing correction is performed.
I'd like to try to understand the reason for this difference and whether I
am applying these methods appropriately.

The response data are measurements of subjects who are either treated
(n=5) or untreated (n=4), with measurements taken at 3 time points in
each subject. The fixed effects are 'day' (3 levels) and 'treated' (2 
levels).
The random effect is 'subject'.

A. From lmer and pvals.fnc ('languageR' package) I get the following:

 > model.lmer <- lmer ( value ~ day + treated + (1|subject), data )
 > pvals.fnc ( model.lmer, nsim=100000, addPlot=F, ndigits=5 )
$fixed
Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)
(Intercept) 24.9542 24.958 10.6752 38.985 0.00126 0.00100
dayday.14 24.4787 24.511 8.4761 40.253 0.00352 0.00492
dayday.21 -2.6075 -2.634 -19.0851 12.744 0.73022 0.74335
treatedTRUE 38.4092 38.409 23.9338 52.987 0.00004 0.00000

$random
Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper
1 subject (Intercept) 0.00014 2.40028 3.27886 0.00000 10.02878
2 Residual 16.69129 16.47799 16.79166 12.04746 22.19050

Note pMCMC for treatedTRUE is 4E-5.

B. Using aov with an Error term by subject the result is:

 > summary(aov ( value ~ treated + day + Error(subject), data))

Error: subject
Df Sum Sq Mean Sq F value Pr(>F)
treated 1 9835 9835 29.73 0.000953 ***
Residuals 7 2316 331
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
day 2 4019 2009.5 7.857 0.0042 **
Residuals 16 4092 255.8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In this case 'treated' has a p-value of 0.000953, which is quite
a bit larger than we had in lmer. I believe these two methods
have the same set of assumptions behind them. If anyone can
provide guidance as to how to interpret this difference I would
be very appreciative. A dput() of the data frame is below.

Regards,
Dan

 > dput(data)
structure(list(subject = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L,
8L, 8L, 9L, 9L, 9L), .Label = c("Subject.1", "Subject.2", "Subject.3",
"Subject.4", "Subject.5", "Subject.6", "Subject.7", "Subject.8",
"Subject.9"), class = "factor"), day = structure(c(2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L), .Label = c("day.09", "day.14",
"day.21"), class = "factor"), treated = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("FALSE", "TRUE"), class = 
"factor"),
value = c(45.0156, 30.3417, 11.294, 40.822, 17.4264, 10.0395,
55.6568, 61.0276, 43.5105, 21.7286, 12.9841, 37.0885, 115.7431,
43.5571, 63.5267, 56.357, 61.7074, 62.3348, 106.6633, 51.7871,
58.4828, 106.9284, 54.7985, 65.9771, 88.0273, 59.5362, 64.3797
)), .Names = c("subject", "day", "treated", "value"), row.names = c(NA,
-27L), class = "data.frame")

-- 
Dan Ruderman, Ph.D.
Assistant Professor of Research Medicine
Center for Applied Molecular Medicine
Keck School of Medicine of USC