[R-sig-ME] GLMM- relationship between AICc weight and random effects?

Mon Jul 11 11:53:38 CEST 2016

Regarding SD: yes, I have several individuals with different number of
locations between them. But, for instance, when I consider study areas, I
have a very low variance (usually between 0-0.2, and SD is also 0 or, for
instance, 0.3, when variance is 0.1). Shouldn't I consider study areas as
random effect? I mean, is the "variance" value enough to include/exclude a
random effect?

How should I test for goodness of fit?

Regarding interactions: yes, I may construct models including interactions
between ID and other variables, but first I think it is better to find a
"strong" global model, and find which variables are most important.

Thank you very much for your help!!
Teresa

2016-07-11 2:06 GMT+01:00 Ben Bolker <bbolker at gmail.com>:

>
>
> On 16-07-09 07:20 PM, Teresa Oliveira wrote:
> > Dear list members,
> >
> > I am developing GLMM's in order to assess habitat selection (using GLMMs'
> > coeficients to construct Resource selection functions). I have
> (telemetry)
> > data from 5 study areas, and each area has a different number of
> > individuals monitored.
> >
> > To develop GLMM's, the dependend variable is binary (1-used locations;
> > 0-available locations), and I have a initial set of 14 continuous
> variables
> > (8 land cover variables; 2 distance variables, to artificial areas and
> > water sources; 4 topographic variables): a buffer was placed around each
> > location and the area of each land cover within that buffer was accounted
> > for; distances were measured from each point to the nearest feature, and
> > topographic variables were obtained using DEM rasters. I tested for
> > correlation using Spearman's Rank, so not all 14 were used in the GLMMs.
> > All variables were transformed using z-score.
> >
> > As random effect, I used individual ID. I thought at the beggining to use
> > study area as a random effect but I only had 5 levels and there was
> almost
> > no variance when that random effect was used.
> >
> > I constructed a GLMM with 9 variables (not correlated) and a random
> effect,
> > then used "dredge()" function and "model.avg(dredge)" to sort models by
> AIC
> > values. This was the result (only models of AICc lower than 2
> represented):
> >
> > [1]Call:
> > model.avg(object = dredge.m1.1)
> >
> > Component model call:
> > glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
> > family =
> >      binomial(link = "logit"))
> >
> > Component models:
> >           df   logLik    AICc  delta weight
> > 123578     8 -4309.94 8635.89   0.00   0.14
> > 1235789    9 -4309.22 8636.44   0.55   0.10
> > 123789     8 -4310.52 8637.04   1.14   0.08
> > 1235678    9 -4309.75 8637.50   1.61   0.06
> > 12378      7 -4311.78 8637.57   1.67   0.06
> > 1234578    9 -4309.79 8637.58   1.69   0.06
> >
> > Variables 1 and 2 represent the distance variables; from 3 to 8 land
> cover
> > variables, and 9 is a topographic variable. Weights seem to be very low,
> > even if I average all those models as it seems to be common when delta
> > values are low.
>
> Well as far as we can tell from this, variables 4-9 aren't doing much
> (on the other hand, variables 1-3 seem to be in all of the top models
> you've shown us -- although presumably there are a bunch more models
> that are almost like these, and similar in weight, with other
> permutations of [123] + [some combination of 456789] ...)
>
>
> Even with this weights, I constructed GLMMs for each of the
> > combinations, and the results were simmilar for all 6 combinations. Here
> > are the results for the first one (GLMM + overdispersion + r-squared):
> >
> > Random effects:
> >  Groups    Name        Variance Std.Dev.
> >  ID.CODE_1 (Intercept) 13.02    3.608
> > Number of obs: 32670, groups:  ID.CODE_1, 55
> >
> > Fixed effects:
> >             Estimate Std. Error z value Pr(>|z|),
> > (Intercept) -0.54891    0.51174  -1.073 0.283433
> > 3       -0.22232    0.04059  -5.478 4.31e-08 ***
> > 5       -0.05433    0.02837  -1.915 0.055460 .
> > 7       -0.13108    0.02825  -4.640 3.49e-06 ***
> > 8       -0.15864    0.08670  -1.830 0.067287 .
> > 1         0.28438    0.02853   9.968  < 2e-16 ***
> > 2         0.11531    0.03021   3.817 0.000135 ***
> > Residual deviance: 0.256
> > r.squaredGLMM():
> >        R2m        R2c
> > 0.01063077 0.80039950
> > This is what I get from this analysis:
> >
> > 1) Variance and SD of the random effect seems fine (definitely better
> than
> > the "0" I got when using Study Areas as random effect);
>
>   yes -- SD of the random effects is much larger than any of the fixed
> effects, which means that the differences among individuals are large
> (presumably that means you have very different numbers of presences for
> different number of individuals [all individuals sharing a common pool
> of pseudo-absences ???)
> >
> > 2) Estimate values make sense from what I know of the species and the
> > knowledge I have of the study areas;
>
>   Good!
> >
> > 3) Overdispersion values seem good, and R-squared values don't seem very
> > good (at least when considering only fixed effects) but, as I read in
> > several places, AIC and r-squared are not always in agreement.
>
>   Overdispersion is meaningless for binary data.
> >
> > 4) Weight values seem very low. Does it mean the models are not good?
>
>   It means there are many approximately equivalent models.  Nothing in
> this output tells you very much about absolute goodness of fit (which is
> tricky for binary data).
> >
> > Then what I did was construct a GLM ("glm()"), so no random effect was
> > used. I used the same set of variables used in [1], and here are the
> > results (only models of AICc lower than 2 represented):
> >
> > [2] Call:
> > model.avg(object = dredge.glm_m1.1)
> >
> > Component model call:
> > glm(formula = Used ~ <512 unique rhs>, family = binomial(link = "logit"),
> > data =
> >      All_SA_Used_RP_Area_z)
> >
> > Component models:
> >           df   logLik     AICc   delta weight
> > 12345678   9 -9251.85 18521.70    0.00   0.52
> > 123456789 10 -9251.77 18523.54    1.84   0.21
> > 1345678    8 -9253.84 18523.69    1.99   0.19
> >
> > In this case, weight values are higher.
> >
> > Does this mean that it is better not to use a random effect? (I am not
> sure
> > I can compare GLMM with GLM results, correct me if I am doing wrong
> > assumptions)
>
>   No.  You could do a likelihood ratio test with anova(), but note that
> the AICc values for the glm() fits are 10,000 (!!) units higher than the
> glmer fits.
>
>   While it will potentially greatly complicate your life, I think you
> should at least *consider* interactions between your environment
> variables and ID, i.e. allow for the possibility that different
> individuals respond differently to habitat variation.
>
>   Ben Bolker
>

	[[alternative HTML version deleted]]