[R-sig-ME] Fwd: car::Anova type III for glmer: are very high chi square values a sign of overfitting?

Tue Apr 16 10:03:02 CEST 2019

Hi John,

Thank you very much for your answer. I am very grateful.

I had indeed set sum contrasts before running car::Anova(mod,type="III")
(i.e. *options(contrasts = c("contr.sum", "contr.poly"))* ).

However, I think the high X² values were due to the unscaled variable
(dry_bio_cover_m2)
which has a very important effect. After centering and scaling this
variable, the output seemed much more reasonable:

Analysis of Deviance Table (Type III Wald chisquare tests)

Response: dry_bio_weeds_m2 + 0.001
                                                                   Chisq Df
Pr(>Chisq)
(Intercept)                                             426.0559  1  <
2.2e-16 ***
block                                                     13.4253  3
0.0038016 **
year                                                       0.5495  1
0.4585110
scale(dry_bio_cover_m2)                  *    82.2881  1  < 2.2e-16 ****
tillage                                                     0.0068  1
0.9343231
N                                                            38.9274  3
1.798e-08 ***
CC                                                         0.6841  2
0.7103247
scale(dry_bio_cover_m2):tillage            7.2857  1  0.0069507 **
scale(dry_bio_cover_m2):N                 * 47.9310  3  2.203e-10 ****
tillage:N                                                 1.5697  3
0.6662878
scale(dry_bio_cover_m2):CC               34.2547  2  3.645e-08 ***
tillage:CC                                              20.2074  2
4.093e-05 ***
N:CC                                                     17.4129  6
0.0078797 **
scale(dry_bio_cover_m2):tillage:N        5.1705  3  0.1597323
scale(dry_bio_cover_m2):tillage:CC     4.3092  2  0.1159493
scale(dry_bio_cover_m2):N:CC            31.7859  6  1.793e-05 ***
tillage:N:CC                                           13.3195  6
0.0382343 *
scale(dry_bio_cover_m2):tillage:N:CC  23.3366  6  0.0006912 ***

Thank you again.

Sincerely,

Guillaume ADEUX

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Le lun. 15 avr. 2019 à 16:14, Fox, John <jfox using mcmaster.ca> a écrit :

> Dear Guillaume,
>
> I don't have anything to say about the magnitude of the Wald statistics,
> which are calculated reasonably straightforwardly from the covariance
> matrix of the fixed effects, but, AFAICS, you're using the default
> contr.treatment to generate contrasts for the factors. With that choice,
> Anova() produces tests for lower-order terms (such as main effects marginal
> to an interaction) that are essentially nonsense.
>
> So, if for some reason, you really want type-III tests, and if you really
> did use the default contrasts, then instead use contrasts that are
> orthogonal in the row-basis of the model matrix, such as contr.sum,
> contr.helmert, or contr.poly. OTOH, why not use type-II tests, which are
> the default in Anova()?
>
> I hope that this helps,
>  John
>
>   -------------------------------------------------
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
>
> > On Apr 15, 2019, at 9:35 AM, Guillaume Adeux <
> guillaumesimon.a2 using gmail.com> wrote:
> >
> > Hello everyone,
> >
> > My experimental lay out is a split split plot experiment
> > (block/tillage/nitrogen/cover crop type) replicated on 4 blocks. 2
> > pseudoreplications were carried out within the experimental units (hence
> > the last level of nesting) each of the two years.
> >
> > I am analyzing the effect of these factors (tillage*nitrogen*cover crop
> > type) on weed biomass.
> >
> > Up until now, the following model was working just fine:
> > mod=glmer(dry_bio_weeds_m2+0.001~*block+year+tillage*nitrogen*cover crop*
> >
> +(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC)
> >
> > However, I also wanted to take into account the variability of cover crop
> > biomass production because I expect that the relationship between weed
> and
> > cover crop biomass is not the same depending on cover crop type (Brassica
> > vs. Legume) :
> >
> mod1=glmer(dry_bio_weeds_m2+0.001~*block+year+dry_bio_cover_m2*tillage*nitrogen*cover
> >
> crop*+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC_wo_C)
> > # the control = baresoil was taken out
> >
> > For each combination of cover crop type, nitrogen level and tillage, 16
> > observations of cover crop biomass (i.e. dry_bio_cover_m2) are available
> (4
> > blocks x 2 points per experimental unit x 2 years). It seems reasonable
> (at
> > least to me) to test these slopes. I usually obtain p. values with
> > monet::test_terms or afex::mixed() but it produces non sensical
> denominator
> > d.f. with this model (first sign of overfitting?). However, mod1 shows a
> 10
> > point AIC drop compared to a model that would not include
> > "dry_bio_cover_m2".
> >
> > To investigate further, I headed toward car::Anova(model, type="III") and
> > obtained the following table:
> >
> > Analysis of Deviance Table (Type III Wald chisquare tests)
> >
> > Response: dry_bio_weeds_m2 + 0.001
> >                                                              Chisq Df
> > Pr(>Chisq)
> > (Intercept)                                   5.3317e+02  1  < 2.2e-16
> ***
> > block                                           8.0032e+00  3  0.0459463
> *
> > year                                             2.7720e-01  1  0.5985096
> >
> > dry_bio_cover_m2                      *7.8745e+04*  1  < 2.2e-16 ***
> > tillage                                          1.3815e+01  1  0.0002017
> > ***
> > N                                                 8.4024e+01  3  <
> 2.2e-16
> > ***
> > CC                                              2.7821e+01  2  9.095e-07
> ***
> > dry_bio_cover_m2:tillage           2.6228e+01  1  3.034e-07 ***
> > dry_bio_cover_m2:N                  *1.3953e+05*  3  < 2.2e-16 ***
> > tillage:N                                      1.3281e+01  3  0.0040657
> **
> > dry_bio_cover_m2:CC               4.2261e+01  2  6.654e-10 ***
> > tillage:CC                                   1.3697e+01  2  0.0010613 **
> > N:CC                                          4.1353e+01  6  2.467e-07
> ***
> > dry_bio_cover_m2:tillage:N       1.7634e+01  3  0.0005234 ***
> > dry_bio_cover_m2:tillage:CC    7.5090e-01  2  0.6869748
> > dry_bio_cover_m2:N:CC           4.7310e+01  6  1.623e-08 ***
> > tillage:N:CC                               1.7857e+01  6  0.0065986 **
> > dry_bio_cover_m2:tillage:N:CC 3.7262e+01  6  1.565e-06 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > I am no statistician but some of the Chi square values seem particularly
> > huge (*7.8745e+04 and **1.3953e+05)*. The output plots however seem to
> back
> > this up....
> >
> > Could anyone give me their feedback?
> >
> > Thank you very much.
> >
> > Guillaume ADEUX
> >
> > PS: don't hesitate to ask for complementary information
> >
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> > Virus-free.
> > www.avast.com
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> > <#m_8799212940511435848_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

	[[alternative HTML version deleted]]