[R-sig-ME] Fwd: car::Anova type III for glmer: are very high chi square values a sign of overfitting?

Mon Apr 15 16:14:47 CEST 2019

Dear Guillaume,

I don't have anything to say about the magnitude of the Wald statistics, which are calculated reasonably straightforwardly from the covariance matrix of the fixed effects, but, AFAICS, you're using the default contr.treatment to generate contrasts for the factors. With that choice, Anova() produces tests for lower-order terms (such as main effects marginal to an interaction) that are essentially nonsense.

So, if for some reason, you really want type-III tests, and if you really did use the default contrasts, then instead use contrasts that are orthogonal in the row-basis of the model matrix, such as contr.sum, contr.helmert, or contr.poly. OTOH, why not use type-II tests, which are the default in Anova()?

I hope that this helps,
 John

  -------------------------------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Apr 15, 2019, at 9:35 AM, Guillaume Adeux <guillaumesimon.a2 using gmail.com> wrote:
> 
> Hello everyone,
> 
> My experimental lay out is a split split plot experiment
> (block/tillage/nitrogen/cover crop type) replicated on 4 blocks. 2
> pseudoreplications were carried out within the experimental units (hence
> the last level of nesting) each of the two years.
> 
> I am analyzing the effect of these factors (tillage*nitrogen*cover crop
> type) on weed biomass.
> 
> Up until now, the following model was working just fine:
> mod=glmer(dry_bio_weeds_m2+0.001~*block+year+tillage*nitrogen*cover crop*
> +(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC)
> 
> However, I also wanted to take into account the variability of cover crop
> biomass production because I expect that the relationship between weed and
> cover crop biomass is not the same depending on cover crop type (Brassica
> vs. Legume) :
> mod1=glmer(dry_bio_weeds_m2+0.001~*block+year+dry_bio_cover_m2*tillage*nitrogen*cover
> crop*+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC_wo_C)
> # the control = baresoil was taken out
> 
> For each combination of cover crop type, nitrogen level and tillage, 16
> observations of cover crop biomass (i.e. dry_bio_cover_m2) are available (4
> blocks x 2 points per experimental unit x 2 years). It seems reasonable (at
> least to me) to test these slopes. I usually obtain p. values with
> monet::test_terms or afex::mixed() but it produces non sensical denominator
> d.f. with this model (first sign of overfitting?). However, mod1 shows a 10
> point AIC drop compared to a model that would not include
> "dry_bio_cover_m2".
> 
> To investigate further, I headed toward car::Anova(model, type="III") and
> obtained the following table:
> 
> Analysis of Deviance Table (Type III Wald chisquare tests)
> 
> Response: dry_bio_weeds_m2 + 0.001
>                                                              Chisq Df
> Pr(>Chisq)
> (Intercept)                                   5.3317e+02  1  < 2.2e-16 ***
> block                                           8.0032e+00  3  0.0459463 *
> year                                             2.7720e-01  1  0.5985096
> 
> dry_bio_cover_m2                      *7.8745e+04*  1  < 2.2e-16 ***
> tillage                                          1.3815e+01  1  0.0002017
> ***
> N                                                 8.4024e+01  3  < 2.2e-16
> ***
> CC                                              2.7821e+01  2  9.095e-07 ***
> dry_bio_cover_m2:tillage           2.6228e+01  1  3.034e-07 ***
> dry_bio_cover_m2:N                  *1.3953e+05*  3  < 2.2e-16 ***
> tillage:N                                      1.3281e+01  3  0.0040657 **
> dry_bio_cover_m2:CC               4.2261e+01  2  6.654e-10 ***
> tillage:CC                                   1.3697e+01  2  0.0010613 **
> N:CC                                          4.1353e+01  6  2.467e-07 ***
> dry_bio_cover_m2:tillage:N       1.7634e+01  3  0.0005234 ***
> dry_bio_cover_m2:tillage:CC    7.5090e-01  2  0.6869748
> dry_bio_cover_m2:N:CC           4.7310e+01  6  1.623e-08 ***
> tillage:N:CC                               1.7857e+01  6  0.0065986 **
> dry_bio_cover_m2:tillage:N:CC 3.7262e+01  6  1.565e-06 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> I am no statistician but some of the Chi square values seem particularly
> huge (*7.8745e+04 and **1.3953e+05)*. The output plots however seem to back
> this up....
> 
> Could anyone give me their feedback?
> 
> Thank you very much.
> 
> Guillaume ADEUX
> 
> PS: don't hesitate to ask for complementary information
> 
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Virus-free.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#m_8799212940511435848_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models