[R-sig-ME] lmer and classical Expected Mean Squares

Thu Sep 12 06:25:19 CEST 2013

Thank you very much Prof. Lenth,

Clear understanding about the models being fitted under the hood
is essential, in addition to the expected mean squares to assess
the proper F-ratio statistics to use in hypothesis testing.

There are some tools in R packages to allow assessment of
expected mean squares:

  > require("varcompci")
  > ward.X <- ward[, c("Fac1", "Fac2")]
  > ward.totvar <- c("Fac1", "Fac2")
  > ward.dsn <- "ward.X"
  > ward.Matrix <- matrix(cbind(c(0,0),c(0,1)),ncol=2)
  > ward.ems <- prettyEMSf(ward.totvar, ward.Matrix,ward.dsn)
  > show(ward.ems)
  Expected Mean Square in nice format
            EMS                                     
  Fac1      "var(Resid) + 3var(Fac1:Fac2) + Q(Fac1)"
  Fac2      "var(Resid) + 9var(Fac2)"               
  Fac1:Fac2 "var(Resid) + 3var(Fac1:Fac2)"          
  resid     "var(Resid)"                            

and note that specifying an aov() call as follows

  > ward.aov <- aov(Y ~ Fac1 + Error(Fac2 + Fac1:Fac2), data = ward)
  > summary(ward.aov)

  Error: Fac2
            Df Sum Sq Mean Sq F value Pr(>F)
  Residuals  1  533.6   533.6               

  Error: Fac2:Fac1
            Df Sum Sq Mean Sq F value Pr(>F)  
  Fac1       2  49.78  24.889      28 0.0345 *
  Residuals  2   1.78   0.889                 
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

  Error: Within
            Df Sum Sq Mean Sq F value Pr(>F)
  Residuals 12  33.33   2.778               

yields the proper F ratio for Fac1 though nothing for Fac2.

Knowing the appropriate expected mean squares allows selection of
the proper F-ratio for Fac2 from the lm() call (or calculation
by hand from the aov() output)

  > anova(ward.lm)
  Analysis of Variance Table

  Response: Y
            Df Sum Sq Mean Sq F value    Pr(>F)    
  Fac1       2  49.78   24.89    8.96  0.004162 ** 
  Fac2       1 533.56  533.56  192.08 9.568e-09 ***
  Fac1:Fac2  2   1.78    0.89    0.32  0.732158    
  Residuals 12  33.33    2.78                      
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
  > 

I remain ever hopeful that someday someone will put all these pieces together
to allow ready assessment of such classical mixed effects experiments (point
me to it if someone has).  Beyond that, a clear understanding of model structure
under the hood is essential, and I thank Prof. Lenth for clarifying the
lmer() issue with respect to the relative sizes of the various error estimates
and mean squares.

The tweak that Prof. Lenth introduced can be visualized using e.g. 

require("trellis")
bwplot(Y ~ Fac1 | Fac2, data = ward)
bwplot(Y ~ Fac1 | Fac2, data = ward2)

Steven McKinney, Ph.D.

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

________________________________________
From: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] On Behalf Of Lenth, Russell V [russell-lenth at uiowa.edu]
Sent: September 11, 2013 7:01 PM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] lmer and classical Expected Mean Squares

A good share of the confusion with this, and the first answer to it, is that different models
have been fitted with lmer (or aov) than the one used for the expected mean squares. So it is
like comparing apples and oranges. The model needed for comparison has random intercepts
for Fac2 and Fac1:Fac2:
> ward.lmer = lmer(Y ~ Fac1 + (1|Fac2) + (1|Fac1:Fac2), data = ward)
Here is the summary:
> ward.lmer
Linear mixed model fit by REML ['lmerMod']
Formula: Y ~ Fac1 + (1 | Fac2) + (1 | Fac1:Fac2)
   Data: ward
REML criterion at convergence: 67.0954
Random effects:
 Groups    Name        Std.Dev.
 Fac1:Fac2 (Intercept) 0.000
 Fac2      (Intercept) 7.681
 Residual              1.584
Number of obs: 18, groups: Fac1:Fac2, 6; Fac2, 2
Fixed Effects:
(Intercept)        Fac1B        Fac1C
  4.179e-14    2.667e+00   -1.333e+00
..  and the ANOVA:
> anova(ward.lmer)
Analysis of Variance Table
     Df Sum Sq Mean Sq F value
Fac1  2 49.778  24.889  9.9241
We see from this that the denominator of the F ratio is 24.89 / 9.9241 = 2.51.
That is not the same as the F ratio from the EMS approach BUT there is a reason:
The variance component estimate for Fac1:Fac2 is zero (because its MS is less
than the MSE). This is tantamount to deleting the interaction from the model.
Observe that if we pool interaction into error, we get
MSE = (1.78 + 33.33) / (2 + 12) = 2.51, i.e. the denominator actually used for F.

To see how this would work out if we don't have a zero variance estimate, I'll
create a new dataset by shifting the data in the first cell so that the interaction is greater:
> ward2 = ward
> ward2$Y[1:3] = ward2$Y[1:3] - 2
> ward2.lm = lm(Y ~ Fac1*Fac2, data = ward2)
> anova(ward2.lm)
Analysis of Variance Table

Response: Y
          Df Sum Sq Mean Sq F value    Pr(>F)
Fac1       2  59.11   29.56   10.64  0.002198
Fac2       1 470.22  470.22  169.28 1.954e-08
Fac1:Fac2  2  11.11    5.56    2.00  0.177979
Residuals 12  33.33    2.78
Thus, using the EMS approach, the correct F ratio for Fac1 is
F = 10.64 / 2.00 = 5.32. The MS for interaction is now larger than MSE.
That will make all the difference.

Here is the model fitted using lmer:
> ward2.lmer = lmer(Y ~ Fac1 + (1|Fac2) + (1|Fac1:Fac2), data = ward2)
> ward2.lmer
Linear mixed model fit by REML ['lmerMod']
Formula: Y ~ Fac1 + (1 | Fac2) + (1 | Fac1:Fac2)
   Data: ward2
REML criterion at convergence: 69.7861
Random effects:
 Groups    Name        Std.Dev.
 Fac1:Fac2 (Intercept) 0.9622
 Fac2      (Intercept) 7.1854
 Residual              1.6667
Number of obs: 18, groups: Fac1:Fac2, 6; Fac2, 2
Fixed Effects:
(Intercept)        Fac1B        Fac1C
    -1.0000       3.6667      -0.3333

> anova(ward2.lmer)
Analysis of Variance Table
     Df Sum Sq Mean Sq F value
Fac1  2 29.556  14.778    5.32
So now we obtain exactly the F ratio as the EMS method. In general, IF NO VARIANCES ARE
ESTIMATED TO BE ZERO in a balanced experiment, the F ratios for fixed effects will be identical
to those based on expected mean squares for the "unrestricted model." The reason for this is
that in this case, the REML estimates of the variance components are the same as the
method-of-moments estimates obtained from the expected mean squares.

Hope this helps.

Russell V. Lenth  -  Professor Emeritus
Department of Statistics and Actuarial Science
The University of Iowa  -  Iowa City, IA 52242  USA
Voice (319)335-0712 (Dept. office)  -  FAX (319)335-3017

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models