[R-sig-ME] car::Anova type III for glmer: are very high chi square values a sign of overfitting?

Fox, John j|ox @end|ng |rom mcm@@ter@c@
Tue Apr 16 14:52:39 CEST 2019


Dear Guillaume,

I wasn't aware that the model included a covariate involved in interactions. Centering a covariate in this context is analogous to using contrasts for factors that are orthogonal in the row basis of the model matrix. Otherwise, you're in effect testing differences where the covariate is 0, which may extrapolate far beyond the range of the data. 

The general point is that you should pay attention to the hypotheses that are tested -- or just use type-II tests, which test generally sensible hypotheses.

Best,
 John

  -------------------------------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Apr 16, 2019, at 4:03 AM, Guillaume Adeux <guillaumesimon.a2 using gmail.com> wrote:
> 
> Hi John,
> 
> Thank you very much for your answer. I am very grateful.
> 
> I had indeed set sum contrasts before running car::Anova(mod,type="III") (i.e. options(contrasts = c("contr.sum", "contr.poly")) ).
> 
> However, I think the high X² values were due to the unscaled variable (dry_bio_cover_m2) which has a very important effect. After centering and scaling this variable, the output seemed much more reasonable:
> 
> Analysis of Deviance Table (Type III Wald chisquare tests)
> 
> Response: dry_bio_weeds_m2 + 0.001
>                                                                    Chisq Df Pr(>Chisq)    
> (Intercept)                                             426.0559  1  < 2.2e-16 ***
> block                                                     13.4253  3  0.0038016 ** 
> year                                                       0.5495  1  0.4585110    
> scale(dry_bio_cover_m2)                      82.2881  1  < 2.2e-16 ***
> tillage                                                     0.0068  1  0.9343231    
> N                                                            38.9274  3  1.798e-08 ***
> CC                                                         0.6841  2  0.7103247    
> scale(dry_bio_cover_m2):tillage            7.2857  1  0.0069507 ** 
> scale(dry_bio_cover_m2):N                  47.9310  3  2.203e-10 ***
> tillage:N                                                 1.5697  3  0.6662878    
> scale(dry_bio_cover_m2):CC               34.2547  2  3.645e-08 ***
> tillage:CC                                              20.2074  2  4.093e-05 ***
> N:CC                                                     17.4129  6  0.0078797 ** 
> scale(dry_bio_cover_m2):tillage:N        5.1705  3  0.1597323    
> scale(dry_bio_cover_m2):tillage:CC     4.3092  2  0.1159493    
> scale(dry_bio_cover_m2):N:CC            31.7859  6  1.793e-05 ***
> tillage:N:CC                                           13.3195  6  0.0382343 *  
> scale(dry_bio_cover_m2):tillage:N:CC  23.3366  6  0.0006912 ***
> 
> Thank you again.
> 
> Sincerely,
> 
> Guillaume ADEUX
> 
> 
> 	Virus-free. www.avast.com
> 
> Le lun. 15 avr. 2019 à 16:14, Fox, John <jfox using mcmaster.ca> a écrit :
> Dear Guillaume,
> 
> I don't have anything to say about the magnitude of the Wald statistics, which are calculated reasonably straightforwardly from the covariance matrix of the fixed effects, but, AFAICS, you're using the default contr.treatment to generate contrasts for the factors. With that choice, Anova() produces tests for lower-order terms (such as main effects marginal to an interaction) that are essentially nonsense.
> 
> So, if for some reason, you really want type-III tests, and if you really did use the default contrasts, then instead use contrasts that are orthogonal in the row-basis of the model matrix, such as contr.sum, contr.helmert, or contr.poly. OTOH, why not use type-II tests, which are the default in Anova()?
> 
> I hope that this helps,
>  John
> 
>   -------------------------------------------------
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
> 
> > On Apr 15, 2019, at 9:35 AM, Guillaume Adeux <guillaumesimon.a2 using gmail.com> wrote:
> > 
> > Hello everyone,
> > 
> > My experimental lay out is a split split plot experiment
> > (block/tillage/nitrogen/cover crop type) replicated on 4 blocks. 2
> > pseudoreplications were carried out within the experimental units (hence
> > the last level of nesting) each of the two years.
> > 
> > I am analyzing the effect of these factors (tillage*nitrogen*cover crop
> > type) on weed biomass.
> > 
> > Up until now, the following model was working just fine:
> > mod=glmer(dry_bio_weeds_m2+0.001~*block+year+tillage*nitrogen*cover crop*
> > +(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC)
> > 
> > However, I also wanted to take into account the variability of cover crop
> > biomass production because I expect that the relationship between weed and
> > cover crop biomass is not the same depending on cover crop type (Brassica
> > vs. Legume) :
> > mod1=glmer(dry_bio_weeds_m2+0.001~*block+year+dry_bio_cover_m2*tillage*nitrogen*cover
> > crop*+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC_wo_C)
> > # the control = baresoil was taken out
> > 
> > For each combination of cover crop type, nitrogen level and tillage, 16
> > observations of cover crop biomass (i.e. dry_bio_cover_m2) are available (4
> > blocks x 2 points per experimental unit x 2 years). It seems reasonable (at
> > least to me) to test these slopes. I usually obtain p. values with
> > monet::test_terms or afex::mixed() but it produces non sensical denominator
> > d.f. with this model (first sign of overfitting?). However, mod1 shows a 10
> > point AIC drop compared to a model that would not include
> > "dry_bio_cover_m2".
> > 
> > To investigate further, I headed toward car::Anova(model, type="III") and
> > obtained the following table:
> > 
> > Analysis of Deviance Table (Type III Wald chisquare tests)
> > 
> > Response: dry_bio_weeds_m2 + 0.001
> >                                                              Chisq Df
> > Pr(>Chisq)
> > (Intercept)                                   5.3317e+02  1  < 2.2e-16 ***
> > block                                           8.0032e+00  3  0.0459463 *
> > year                                             2.7720e-01  1  0.5985096
> > 
> > dry_bio_cover_m2                      *7.8745e+04*  1  < 2.2e-16 ***
> > tillage                                          1.3815e+01  1  0.0002017
> > ***
> > N                                                 8.4024e+01  3  < 2.2e-16
> > ***
> > CC                                              2.7821e+01  2  9.095e-07 ***
> > dry_bio_cover_m2:tillage           2.6228e+01  1  3.034e-07 ***
> > dry_bio_cover_m2:N                  *1.3953e+05*  3  < 2.2e-16 ***
> > tillage:N                                      1.3281e+01  3  0.0040657 **
> > dry_bio_cover_m2:CC               4.2261e+01  2  6.654e-10 ***
> > tillage:CC                                   1.3697e+01  2  0.0010613 **
> > N:CC                                          4.1353e+01  6  2.467e-07 ***
> > dry_bio_cover_m2:tillage:N       1.7634e+01  3  0.0005234 ***
> > dry_bio_cover_m2:tillage:CC    7.5090e-01  2  0.6869748
> > dry_bio_cover_m2:N:CC           4.7310e+01  6  1.623e-08 ***
> > tillage:N:CC                               1.7857e+01  6  0.0065986 **
> > dry_bio_cover_m2:tillage:N:CC 3.7262e+01  6  1.565e-06 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > 
> > I am no statistician but some of the Chi square values seem particularly
> > huge (*7.8745e+04 and **1.3953e+05)*. The output plots however seem to back
> > this up....
> > 
> > Could anyone give me their feedback?
> > 
> > Thank you very much.
> > 
> > Guillaume ADEUX
> > 
> > PS: don't hesitate to ask for complementary information
> > 
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> > Virus-free.
> > www.avast.com
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> > <#m_8799212940511435848_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > 
> >       [[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
> 	Virus-free. www.avast.com



More information about the R-sig-mixed-models mailing list