[R-sig-eco] car::Anova type III for glmer: are very high chi square values a sign of overfitting?
Guillaume Adeux
gu|||@ume@|mon@@2 @end|ng |rom gm@||@com
Mon Apr 15 15:34:23 CEST 2019
Hello everyone,
My experimental lay out is a split split plot experiment
(block/tillage/nitrogen/cover crop type) replicated on 4 blocks. 2
pseudoreplications were carried out within the experimental units (hence
the last level of nesting) each of the two years.
I am analyzing the effect of these factors (tillage*nitrogen*cover crop
type) on weed biomass.
Up until now, the following model was working just fine:
mod=glmer(dry_bio_weeds_m2+0.001~*block+year+tillage*nitrogen*cover crop*
+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC)
However, I also wanted to take into account the variability of cover crop
biomass production because I expect that the relationship between weed and
cover crop biomass is not the same depending on cover crop type (Brassica
vs. Legume) :
mod1=glmer(dry_bio_weeds_m2+0.001~*block+year+dry_bio_cover_m2*tillage*nitrogen*cover
crop*+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)),data=biomassCC_wo_C)
# the control = baresoil was taken out
For each combination of cover crop type, nitrogen level and tillage, 16
observations of cover crop biomass (i.e. dry_bio_cover_m2) are available (4
blocks x 2 points per experimental unit x 2 years). It seems reasonable (at
least to me) to test these slopes. I usually obtain p. values with
monet::test_terms or afex::mixed() but it produces non sensical denominator
d.f. with this model (first sign of overfitting?). However, mod1 shows a 10
point AIC drop compared to a model that would not include
"dry_bio_cover_m2".
To investigate further, I headed toward car::Anova(model, type="III") and
obtained the following table:
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: dry_bio_weeds_m2 + 0.001
Chisq Df
Pr(>Chisq)
(Intercept) 5.3317e+02 1 < 2.2e-16 ***
block 8.0032e+00 3 0.0459463 *
year 2.7720e-01 1 0.5985096
dry_bio_cover_m2 *7.8745e+04* 1 < 2.2e-16 ***
tillage 1.3815e+01 1 0.0002017
***
N 8.4024e+01 3 < 2.2e-16
***
CC 2.7821e+01 2 9.095e-07 ***
dry_bio_cover_m2:tillage 2.6228e+01 1 3.034e-07 ***
dry_bio_cover_m2:N *1.3953e+05* 3 < 2.2e-16 ***
tillage:N 1.3281e+01 3 0.0040657 **
dry_bio_cover_m2:CC 4.2261e+01 2 6.654e-10 ***
tillage:CC 1.3697e+01 2 0.0010613 **
N:CC 4.1353e+01 6 2.467e-07 ***
dry_bio_cover_m2:tillage:N 1.7634e+01 3 0.0005234 ***
dry_bio_cover_m2:tillage:CC 7.5090e-01 2 0.6869748
dry_bio_cover_m2:N:CC 4.7310e+01 6 1.623e-08 ***
tillage:N:CC 1.7857e+01 6 0.0065986 **
dry_bio_cover_m2:tillage:N:CC 3.7262e+01 6 1.565e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I am no statistician but some of the Chi square values seem particularly
huge (*7.8745e+04 and **1.3953e+05)*. The output plots however seem to back
this up....
Could anyone give me their feedback?
Thank you very much.
Guillaume ADEUX
PS: don't hesitate to ask for complementary information
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
[[alternative HTML version deleted]]
More information about the R-sig-ecology
mailing list