[R-sig-ME] Make a 'between-and-within-factors' ANOVA with lmer function

Mon Apr 29 01:47:14 CEST 2013

On Mar 24, 2013, at 23:02 , Ben Bolker wrote:

> ian m s white <i.m.s.white at ...> writes:
> 
>> 
>> I reckon lmer can figure out for itself what is between and what is within
> subjects, so
>> 
>> lmer(DV ~ IV1*IV2*IV3*IV4 + (1|Subject))
>> 
>> should fit the same model as your ANOVA.
> 
>  If you want to allow for variation in the IV3 and IV4 effects among
> subjects you might want
> 
> lmer(DV ~ IV1*IV2*IV3*IV4 + (IV3*IV4|Subject))
> 
>  You might want to use lme rather than lme4 for the purposes of
> getting calculated denominator df and p-values, which lmer won't
> give you ...

[Sorry, I realize this a month old, but I didn't see the thread until now.]

Be careful! There are cases where "calculated" is about the most positive thing you can say about the df output from lme(), and I suspect this could be one of them. I'd go for lme4 plus the pbkrtest package.

Here's a simple example (you may need to re-simulate a couple of times to get both error terms with positive estimated variance; or be a little smarter.)

> y <- rnorm(99)
> s <- factor(rep(1:4,c(23,23,23,30)))
> x <- rep(0:1,c(46,53))
> summary(lme(y~x,random=~1|s))
Linear mixed-effects model fit by REML
...
Fixed effects: y ~ x 
                 Value Std.Error DF    t-value p-value
(Intercept)  0.1933857 0.2454292 95  0.7879488  0.4327
x           -0.3227735 0.3437836  2 -0.9388859  0.4469
...

Notice that the intercept is very close to the average of the two first levels of s, so mostly influenced by the between-subject variation. So you'd expect DF around 2, not 95.

The lme() function makes this sort of mistakes when it cannot detect that a regression variable is a coarsening of a random effect term. It does this successfully for x, but not for the intercept. In more complex models, it can affect main effects as well. If you try to fool lme() into fitting crossed random effects, it will get the wrong degrees of freedom, even in a perfectly balanced design.

That's basically the reason Doug Bates removed DF from the lmer() output: The heuristics weren't good enough, and when they got it wrong, they could turn single digit df to something on the order of the total number of observation.

Compare

> summary(fit1 <- lmer(y~x+(1|s)))
Linear mixed model fit by REML 
Formula: y ~ x + (1 | s) 
   AIC   BIC logLik deviance REMLdev
 279.7 290.1 -135.9    269.7   271.7
Random effects:
 Groups   Name        Variance Std.Dev.
 s        (Intercept) 0.082749 0.28766 
 Residual             0.867608 0.93145 
Number of obs: 99, groups: s, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)   0.1934     0.2454   0.788
x            -0.3228     0.3437  -0.939

Correlation of Fixed Effects:
  (Intr)
x -0.714

> fit2 <- lmer(y~x-1+(1|s))
> KRmodcomp(fit1, fit2)
F-test with Kenward-Roger approximation; computing time: 0.13 sec.
large : y ~ xx + (1 | s)
small : y ~ x - 1 + (1 | s)
        stat    ndf    ddf F.scaling p.value
Ftest 0.6207 1.0000 2.0783         1  0.5106

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com