[R-sig-ME] Fwd: car::Anova type III for glmer: are very high chi square values a sign of overfitting?
Guillaume Adeux
gu|||@ume@|mon@@2 @end|ng |rom gm@||@com
Tue Apr 16 10:03:02 CEST 2019
Hi John,
Thank you very much for your answer. I am very grateful.
I had indeed set sum contrasts before running car::Anova(mod,type="III")
(i.e. *options(contrasts = c("contr.sum", "contr.poly"))* ).
However, I think the high X² values were due to the unscaled variable
(dry_bio_cover_m2)
which has a very important effect. After centering and scaling this
variable, the output seemed much more reasonable:
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: dry_bio_weeds_m2 + 0.001
Chisq Df
Pr(>Chisq)
(Intercept) 426.0559 1 <
2.2e-16 ***
block 13.4253 3
0.0038016 **
year 0.5495 1
0.4585110
scale(dry_bio_cover_m2) * 82.2881 1 < 2.2e-16 ****
tillage 0.0068 1
0.9343231
N 38.9274 3
1.798e-08 ***
CC 0.6841 2
0.7103247
scale(dry_bio_cover_m2):tillage 7.2857 1 0.0069507 **
scale(dry_bio_cover_m2):N * 47.9310 3 2.203e-10 ****
tillage:N 1.5697 3
0.6662878
scale(dry_bio_cover_m2):CC 34.2547 2 3.645e-08 ***
tillage:CC 20.2074 2
4.093e-05 ***
N:CC 17.4129 6
0.0078797 **
scale(dry_bio_cover_m2):tillage:N 5.1705 3 0.1597323
scale(dry_bio_cover_m2):tillage:CC 4.3092 2 0.1159493
scale(dry_bio_cover_m2):N:CC 31.7859 6 1.793e-05 ***
tillage:N:CC 13.3195 6
0.0382343 *
scale(dry_bio_cover_m2):tillage:N:CC 23.3366 6 0.0006912 ***
Thank you again.
Sincerely,
Guillaume ADEUX
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Le lun. 15 avr. 2019 à 16:14, Fox, John <jfox using mcmaster.ca> a écrit :
> Dear Guillaume,
>
> I don't have anything to say about the magnitude of the Wald statistics,
> which are calculated reasonably straightforwardly from the covariance
> matrix of the fixed effects, but, AFAICS, you're using the default
> contr.treatment to generate contrasts for the factors. With that choice,
> Anova() produces tests for lower-order terms (such as main effects marginal
> to an interaction) that are essentially nonsense.
>
> So, if for some reason, you really want type-III tests, and if you really
> did use the default contrasts, then instead use contrasts that are
> orthogonal in the row-basis of the model matrix, such as contr.sum,
> contr.helmert, or contr.poly. OTOH, why not use type-II tests, which are
> the default in Anova()?
>
> I hope that this helps,
> John
>
> -------------------------------------------------
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> Web: http::/socserv.mcmaster.ca/jfox
>
> > On Apr 15, 2019, at 9:35 AM, Guillaume Adeux <
> guillaumesimon.a2 using gmail.com> wrote:
> >
> > Hello everyone,
> >
> > My experimental lay out is a split split plot experiment
> > (block/tillage/nitrogen/cover crop type) replicated on 4 blocks. 2
> > pseudoreplications were carried out within the experimental units (hence
> > the last level of nesting) each of the two years.
> >
> > I am analyzing the effect of these factors (tillage*nitrogen*cover crop
> > type) on weed biomass.
> >
> > Up until now, the following model was working just fine:
> > mod=glmer(dry_bio_weeds_m2+0.001~*block+year+tillage*nitrogen*cover crop*
> >
> +(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC)
> >
> > However, I also wanted to take into account the variability of cover crop
> > biomass production because I expect that the relationship between weed
> and
> > cover crop biomass is not the same depending on cover crop type (Brassica
> > vs. Legume) :
> >
> mod1=glmer(dry_bio_weeds_m2+0.001~*block+year+dry_bio_cover_m2*tillage*nitrogen*cover
> >
> crop*+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC_wo_C)
> > # the control = baresoil was taken out
> >
> > For each combination of cover crop type, nitrogen level and tillage, 16
> > observations of cover crop biomass (i.e. dry_bio_cover_m2) are available
> (4
> > blocks x 2 points per experimental unit x 2 years). It seems reasonable
> (at
> > least to me) to test these slopes. I usually obtain p. values with
> > monet::test_terms or afex::mixed() but it produces non sensical
> denominator
> > d.f. with this model (first sign of overfitting?). However, mod1 shows a
> 10
> > point AIC drop compared to a model that would not include
> > "dry_bio_cover_m2".
> >
> > To investigate further, I headed toward car::Anova(model, type="III") and
> > obtained the following table:
> >
> > Analysis of Deviance Table (Type III Wald chisquare tests)
> >
> > Response: dry_bio_weeds_m2 + 0.001
> > Chisq Df
> > Pr(>Chisq)
> > (Intercept) 5.3317e+02 1 < 2.2e-16
> ***
> > block 8.0032e+00 3 0.0459463
> *
> > year 2.7720e-01 1 0.5985096
> >
> > dry_bio_cover_m2 *7.8745e+04* 1 < 2.2e-16 ***
> > tillage 1.3815e+01 1 0.0002017
> > ***
> > N 8.4024e+01 3 <
> 2.2e-16
> > ***
> > CC 2.7821e+01 2 9.095e-07
> ***
> > dry_bio_cover_m2:tillage 2.6228e+01 1 3.034e-07 ***
> > dry_bio_cover_m2:N *1.3953e+05* 3 < 2.2e-16 ***
> > tillage:N 1.3281e+01 3 0.0040657
> **
> > dry_bio_cover_m2:CC 4.2261e+01 2 6.654e-10 ***
> > tillage:CC 1.3697e+01 2 0.0010613 **
> > N:CC 4.1353e+01 6 2.467e-07
> ***
> > dry_bio_cover_m2:tillage:N 1.7634e+01 3 0.0005234 ***
> > dry_bio_cover_m2:tillage:CC 7.5090e-01 2 0.6869748
> > dry_bio_cover_m2:N:CC 4.7310e+01 6 1.623e-08 ***
> > tillage:N:CC 1.7857e+01 6 0.0065986 **
> > dry_bio_cover_m2:tillage:N:CC 3.7262e+01 6 1.565e-06 ***
> > ---
> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > I am no statistician but some of the Chi square values seem particularly
> > huge (*7.8745e+04 and **1.3953e+05)*. The output plots however seem to
> back
> > this up....
> >
> > Could anyone give me their feedback?
> >
> > Thank you very much.
> >
> > Guillaume ADEUX
> >
> > PS: don't hesitate to ask for complementary information
> >
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> > Virus-free.
> > www.avast.com
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> > <#m_8799212940511435848_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list