[R-sig-ME] sums of squares and F values in anova

Thu Sep 25 00:03:43 CEST 2014

Dear Ben,

anova() computes sequential ("type I") tests, so one wouldn't for correlated Xs expect the result to be the same as the "type II" or "type III" tests computed by Anova(), which, however, for an additive model should be the same as each other. You *would* expect the last test in the sequence produced by anova() to be the same as the corresponding type-II or -III test. 

In the case of LMMs fit by lmer(), Anova() computes Wald F-tests using the Kenward-Roger coefficient covariance matrix and Satterthwaite df; for the data/model in your example, that should produce a small difference in the F-statistics.

Here's what I get for your example:

------------- snip ----------

> anova(fit)
Analysis of Variance Table

Response: sr
          Df Sum Sq Mean Sq F value    Pr(>F)    
pop15      1 204.12 204.118 14.1157 0.0004922 ***
pop75      1  53.34  53.343  3.6889 0.0611255 .  
dpi        1  12.40  12.401  0.8576 0.3593551    
ddpi       1  63.05  63.054  4.3605 0.0424711 *  
Residuals 45 650.71  14.460                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> anova(fit2) ## practically equal to anova(fit) above
Analysis of Variance Table
      Df  Sum Sq Mean Sq F value
pop15  1 204.118 204.118 14.1157
pop75  1  53.343  53.343  3.6889
dpi    1  12.401  12.401  0.8576
ddpi   1  63.054  63.054  4.3605

> Anova(fit)
Anova Table (Type II tests)

Response: sr
          Sum Sq Df F value   Pr(>F)   
pop15     147.01  1 10.1666 0.002603 **
pop75      35.24  1  2.4367 0.125530   
dpi         1.89  1  0.1309 0.719173   
ddpi       63.05  1  4.3605 0.042471 * 
Residuals 650.71 45 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  

> Anova(fit, type=3)
Anova Table (Type III tests)

Response: sr
            Sum Sq Df F value    Pr(>F)    
(Intercept) 218.16  1 15.0867 0.0003338 ***
pop15       147.01  1 10.1666 0.0026030 ** 
pop75        35.24  1  2.4367 0.1255298    
dpi           1.89  1  0.1309 0.7191732    
ddpi         63.05  1  4.3605 0.0424711 *  
Residuals   650.71 45                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> Anova(fit2)
Analysis of Deviance Table (Type II Wald chisquare tests)

Response: sr
        Chisq Df Pr(>Chisq)   
pop15 10.1666  1    0.00143 **
pop75  2.4367  1    0.11852   
dpi    0.1309  1    0.71748   
ddpi   4.3605  1    0.03678 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> Anova(fit2, test="F")
Loading required package: pbkrtest
Loading required package: MASS
Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)

Response: sr
            F Df Df.res  Pr(>F)   
pop15 10.1519  1 44.031 0.00265 **
pop75  2.4326  1 44.036 0.12599   
dpi    0.1245  1 44.811 0.72588   
ddpi   4.2179  1 44.600 0.04589 * 
---                  
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> Anova(fit2, test="F", type=3)
Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)

Response: sr
                  F Df Df.res    Pr(>F)    
(Intercept) 14.9643  1 44.560 0.0003538 ***
pop15       10.1519  1 44.031 0.0026503 ** 
pop75        2.4326  1 44.036 0.1259919    
dpi          0.1245  1 44.811 0.7258840    
ddpi         4.2179  1 44.600 0.0458898 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

----------------- snip ----------

These results are as expected.

Best,
 John

On Wed, 24 Sep 2014 17:14:31 -0400
 Ben Bolker <bbolker at gmail.com> wrote:
> On 14-09-24 04:21 PM, Alexandra Kuznetsova wrote:
> > Dear lme4 authors,
> > 
> > I have a question regarding the calculation of sums of squares in
> > anova for lmerMod object. I know that there were some discussions
> > regarding how they are calculated (but still remains unclear to
> > me..).
> > 
> > As far as I understand the way they are calculated is similar to the
> > way they are calculated in lm objects, that is transforming Y into
> > orthogonal Q space, and then computing sums of squares for the
> > independent effects.
> > 
> > I have found in your  JSS paper "Fitting linear mixed effects models
> > using lme4" some explanations (in equations 67 and 68). Would it be
> > possible to give some more comments on these equations? And what
> > about the partial (type 3) sums of squares -  is there a way to
> > calculate them using the same way?
> > 
> > Hope my questions were clear! Thank you in advance!
> 
>   I'm not sure exactly what your questions are.  Could you please
> clarify (sorry!) what you mean about partial sums of squares?  Here's an
> example showing that the results of lme4's anova.merMod and base R's:
> anova.lm do in fact agree for a model with the random effects variance
> forced to (almost) zero.  At the risk of further muddying the water,
> I'll point out that car::Anova(fit,type="II") and
> car::Anova(fit,type="III") give two different answers for this problem,
> neither of which matches the computation below ...
> 
> ===================
> fit <- lm(sr ~ ., data = LifeCycleSavings)
> anova(fit)
> 
> 
> ## construct lmer model with near-zero variance
> LC2 <- transform(LifeCycleSavings,f=factor(1:2)) ## bogus
> library("lme4")
> form <- sr ~ pop15 + pop75 + dpi + ddpi + (1|f)  ## hack
> ## to avoid (Error in terms.formula(formula(x, fixed.only = TRUE)) :
> ##   '.' in formula and no 'data' argument)
> 
> lmod <- lFormula(form, data=LC2)
> d2 <- lmer(form, data=LC2,devFunOnly=TRUE)
> llik <- d2(1e-5)
> fit2 <- mkMerMod(environment(d2),opt=list(par=1e-5,
>                            fval=llik,
>                            feval=1,
>                            conv=0,
>                            message=NULL),
>                          lmod$reTrms, fr = lmod$fr)
> all.equal(coef(fit),fixef(fit2))
> anova(fit2)   ## practically equal to anova(fit) above
> ==================
> 
> For those following along, the paper is at
> http://arxiv.org/abs/1406.5823 (or http://arxiv.org/pdf/1406.5823v1 for
> a direct link to the PDF), and the raw LaTeX for the specific section is:
> 
> ===============
> To understand how these quantities are computed, let $\bm R_i$ contain
> the rows of $\bm R_X$ (Equation~\ref{eq:blockCholeskyDecomp}) associated
> with the $i$th fixed-effects term.  Then the sum of squares for term
> $i$ is,
> \begin{equation}
>   \label{eq:SS}
>   SS_i = \widehat{\bm\beta}\trans\bm R_i\trans \bm R_i \widehat{\bm\beta}
> \end{equation}
> If $DF_i$ is the number of columns in $\bm R_i$, then the
> $F$~statistic for term $i$ is,
> \begin{equation}
>   \label{eq:Fstat}
>   F_i = \frac{SS_i}{\widehat{\sigma}^2 DF_i}
> \end{equation}
> 
> 
> > 
> > Alexandra Kuznetsova _______________________________________________ 
> > R-sig-mixed-models at r-project.org mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models