[R-sig-ME] Problem with overfitting
Reinhold Kliegl
reinhold.kliegl at gmail.com
Tue Apr 19 21:12:07 CEST 2011
I took a quick look at your data. There are two problems I see.
First, "year" has only two levels.
> table(d$year)
2009 2010
50 29
These are too few levels to model year as a random factor. Moreover,
when I included it as a fixed factor it appears that the variable is
confounded with a linear combination of your other predictors. So it
is probably best to just leave the variable out of the model.
Second, a crosstabulation of site and pair also reveals a pattern of
probably too many empty cells relative to your total number of
observations.
> table(d$site, d$pair)
1 2 3 4 5 6 7 8
1 2 2 2 0 0 0 0 0
2 0 0 2 4 2 0 1 0
3 4 2 2 0 0 0 0 0
4 4 4 0 0 0 0 0 0
5 2 2 0 0 0 0 0 0
7 2 3 1 0 0 0 0 0
9 0 4 0 2 2 2 2 2
10 2 2 2 0 0 0 0 0
11 4 4 4 0 0 0 0 0
12 2 2 2 0 0 0 0 0
So, I also dropped "pair". Then the model converges nicely. No need
for breaking up the data.
> print(broodmodel6<-lmer(brood2~briventral+inslarge+weatherpc1+sex+
briventral:sex+briventral:inslarge+briventral:weatherpc1 +
(1|site), family=binomial, data=d), cor=FALSE)
Generalized linear mixed model fit by the Laplace approximation
Formula: brood2 ~ briventral + inslarge + weatherpc1 + sex +
briventral:sex + briventral:inslarge + briventral:weatherpc1 + (1
| site)
Data: d
AIC BIC logLik deviance
73.26 94.58 -27.63 55.26
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 0 0
Number of obs: 79, groups: site, 10
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 10.94360 7.53481 1.452 0.1464
briventral -0.10523 0.08019 -1.312 0.1895
inslarge -8.69568 5.30184 -1.640 0.1010
weatherpc1 2.26995 1.25105 1.814 0.0696 .
sexM -2.16005 3.45650 -0.625 0.5320
briventral:sexM 0.03182 0.04131 0.770 0.4412
briventral:inslarge 0.09740 0.05853 1.664 0.0961 .
briventral:weatherpc1 -0.02349 0.01276 -1.841 0.0656 .
Finally, I thought you might be more interested in "pair" as a random
factor. So I dropped site.
> print(broodmodel6<-lmer(brood2~briventral+inslarge+weatherpc1+sex+
briventral:sex+briventral:inslarge+briventral:weatherpc1 +
(1|pair), family=binomial, data=d), cor=FALSE)
Generalized linear mixed model fit by the Laplace approximation
Formula: brood2 ~ briventral + inslarge + weatherpc1 + sex +
briventral:sex + briventral:inslarge + briventral:weatherpc1 + (1
| pair)
Data: d
AIC BIC logLik deviance
50.82 72.14 -16.41 32.82
Random effects:
Groups Name Variance Std.Dev.
pair (Intercept) 675.84 25.997
Number of obs: 79, groups: pair, 8
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 64.01415 32.08935 1.995 0.0461 *
briventral -0.57502 0.32984 -1.743 0.0813 .
inslarge -50.34697 24.77684 -2.032 0.0422 *
weatherpc1 12.24652 5.98171 2.047 0.0406 *
sexM 5.39407 9.25929 0.583 0.5602
briventral:sexM -0.02499 0.09972 -0.251 0.8022
briventral:inslarge 0.60157 0.29148 2.064 0.0390 *
briventral:weatherpc1 -0.11482 0.05807 -1.977 0.0480 *
Bottom line is that I suspect that your matrix is too sparse for a
crossed-random factor GLMM.
Reinhold Kliegl
On Tue, Apr 19, 2011 at 3:18 PM, Iker Vaquero Alba <karraspito at yahoo.es> wrote:
>
> Hello all:
>
> I am trying to fit a model with lmer and doing a split-plot simplification. The data are attached. The problem is that when doing some of the anovas to compare different models, I get a p-value of 1. I have been told this may be a problem of variable overfitting. But I am doing simpler and simpler models and I still have the same problem, so I don't know where is really the problem. This is the last one:
>
> broodmodel6<-lmer(brood2~briventral+inslarge+weatherpc1+sex+briventral:sex+briventral:inslarge+briventral:weatherpc1+(1|site/pair)+(1|year),family=binomial)
>
> When simplifying "briventral:sex" and comparing the two models with an anova, I get a p-value of 1.
>
> Any help, suggestions and ideas will be welcome.
> Thank you very much.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
More information about the R-sig-mixed-models
mailing list