[R-sig-ME] Problem with overfitting

Reinhold Kliegl reinhold.kliegl at gmail.com
Tue Apr 19 21:12:07 CEST 2011


I took a quick look at your data. There are two problems I see.
First, "year" has only two levels.

> table(d$year)

2009 2010
  50   29

These are too few levels to model year as a random factor. Moreover,
when I included it as a fixed factor it appears that the variable is
confounded with a linear combination of your other predictors. So it
is probably best to just leave the variable out of the model.

Second,  a crosstabulation of site and pair also reveals a pattern of
probably too many empty cells relative to your total number of
observations.

> table(d$site, d$pair)

     1 2 3 4 5 6 7 8
  1  2 2 2 0 0 0 0 0
  2  0 0 2 4 2 0 1 0
  3  4 2 2 0 0 0 0 0
  4  4 4 0 0 0 0 0 0
  5  2 2 0 0 0 0 0 0
  7  2 3 1 0 0 0 0 0
  9  0 4 0 2 2 2 2 2
  10 2 2 2 0 0 0 0 0
  11 4 4 4 0 0 0 0 0
  12 2 2 2 0 0 0 0 0

So, I also dropped "pair". Then the model converges nicely. No need
for breaking up the data.

> print(broodmodel6<-lmer(brood2~briventral+inslarge+weatherpc1+sex+
      briventral:sex+briventral:inslarge+briventral:weatherpc1 +
      (1|site), family=binomial, data=d), cor=FALSE)
Generalized linear mixed model fit by the Laplace approximation
Formula: brood2 ~ briventral + inslarge + weatherpc1 + sex +
briventral:sex +      briventral:inslarge + briventral:weatherpc1 + (1
| site)
   Data: d
   AIC   BIC logLik deviance
 73.26 94.58 -27.63    55.26
Random effects:
 Groups Name        Variance Std.Dev.
 site   (Intercept)  0        0
Number of obs: 79, groups: site, 10

Fixed effects:
                      Estimate Std. Error z value Pr(>|z|)
(Intercept)           10.94360    7.53481   1.452   0.1464
briventral            -0.10523    0.08019  -1.312   0.1895
inslarge              -8.69568    5.30184  -1.640   0.1010
weatherpc1             2.26995    1.25105   1.814   0.0696 .
sexM                  -2.16005    3.45650  -0.625   0.5320
briventral:sexM        0.03182    0.04131   0.770   0.4412
briventral:inslarge    0.09740    0.05853   1.664   0.0961 .
briventral:weatherpc1 -0.02349    0.01276  -1.841   0.0656 .

Finally, I thought you might be more interested in "pair" as a random
factor. So I dropped site.
> print(broodmodel6<-lmer(brood2~briventral+inslarge+weatherpc1+sex+
      briventral:sex+briventral:inslarge+briventral:weatherpc1 +
      (1|pair), family=binomial, data=d), cor=FALSE)

Generalized linear mixed model fit by the Laplace approximation
Formula: brood2 ~ briventral + inslarge + weatherpc1 + sex +
briventral:sex +      briventral:inslarge + briventral:weatherpc1 + (1
| pair)
   Data: d
   AIC   BIC logLik deviance
 50.82 72.14 -16.41    32.82
Random effects:
 Groups Name        Variance Std.Dev.
 pair   (Intercept) 675.84   25.997
Number of obs: 79, groups: pair, 8

Fixed effects:
                       Estimate Std. Error z value Pr(>|z|)
(Intercept)            64.01415   32.08935   1.995   0.0461 *
briventral             -0.57502    0.32984  -1.743   0.0813 .
inslarge              -50.34697   24.77684  -2.032   0.0422 *
weatherpc1             12.24652    5.98171   2.047   0.0406 *
sexM                    5.39407    9.25929   0.583   0.5602
briventral:sexM        -0.02499    0.09972  -0.251   0.8022
briventral:inslarge     0.60157    0.29148   2.064   0.0390 *
briventral:weatherpc1  -0.11482    0.05807  -1.977   0.0480 *

Bottom line is that I suspect that your matrix is too sparse for a
crossed-random factor GLMM.

Reinhold Kliegl

On Tue, Apr 19, 2011 at 3:18 PM, Iker Vaquero Alba <karraspito at yahoo.es> wrote:
>
>     Hello all:
>
>    I am trying to fit a model with lmer and doing a split-plot simplification. The data are attached. The problem is that when doing some of the anovas to compare different models, I get a p-value of 1. I have been told this may be a problem of variable overfitting. But I am doing simpler and simpler models and I still have the same problem, so I don't know where is really the problem. This is the last one:
>
>    broodmodel6<-lmer(brood2~briventral+inslarge+weatherpc1+sex+briventral:sex+briventral:inslarge+briventral:weatherpc1+(1|site/pair)+(1|year),family=binomial)
>
>    When simplifying "briventral:sex" and comparing the two models with an anova, I get a p-value of 1.
>
>    Any help, suggestions and ideas will be welcome.
>    Thank you very much.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>




More information about the R-sig-mixed-models mailing list