[R-sig-ME] GLMM- relationship between AICc weight and random effects?

Mon Jul 11 12:01:17 CEST 2016

So, you are saying that the estimate values I get for the fixed effects,
and SD, do not have in account the random intercept (in this case, ID)?

What do you mean by "2-stage approaches"?
Regarding random slopes, are they represented by, for instance, (Variable
1|ID)?

I have seen a lot of studies where everyone uses GLMM to construct RSF, but
I have also read that they are not appropriate for RSF, however no one
seems sure about the right approach...

Thank you very much for your help!!
Teresa

2016-07-11 3:15 GMT+01:00 Craig DeMars <cdemars at ualberta.ca>:

> I would add some caution when interpreting the estimated values when using
> random-intercept only GLMMs for RSFs. The standard errors for the fixed
> effects do not reflect the animal as the sampling unit (see Schielzeth and
> Forstmeier 2009).  Thus, if your objective is to make inference to the
> larger population of animals, the standard errors for the fixed effects are
> far too narrow (i.e. overconfident). It is more appropriate to use 2-stage
> approaches or GLMMs that use random slopes for the variables of interest.
>
> Random-intercept only models are somewhat abused in the RSF literature in
> this regard, in my opinion......
>
> On Sun, Jul 10, 2016 at 7:06 PM, Ben Bolker <bbolker at gmail.com> wrote:
>
>>
>>
>> On 16-07-09 07:20 PM, Teresa Oliveira wrote:
>> > Dear list members,
>> >
>> > I am developing GLMM's in order to assess habitat selection (using
>> GLMMs'
>> > coeficients to construct Resource selection functions). I have
>> (telemetry)
>> > data from 5 study areas, and each area has a different number of
>> > individuals monitored.
>> >
>> > To develop GLMM's, the dependend variable is binary (1-used locations;
>> > 0-available locations), and I have a initial set of 14 continuous
>> variables
>> > (8 land cover variables; 2 distance variables, to artificial areas and
>> > water sources; 4 topographic variables): a buffer was placed around each
>> > location and the area of each land cover within that buffer was
>> accounted
>> > for; distances were measured from each point to the nearest feature, and
>> > topographic variables were obtained using DEM rasters. I tested for
>> > correlation using Spearman's Rank, so not all 14 were used in the GLMMs.
>> > All variables were transformed using z-score.
>> >
>> > As random effect, I used individual ID. I thought at the beggining to
>> use
>> > study area as a random effect but I only had 5 levels and there was
>> almost
>> > no variance when that random effect was used.
>> >
>> > I constructed a GLMM with 9 variables (not correlated) and a random
>> effect,
>> > then used "dredge()" function and "model.avg(dredge)" to sort models by
>> AIC
>> > values. This was the result (only models of AICc lower than 2
>> represented):
>> >
>> > [1]Call:
>> > model.avg(object = dredge.m1.1)
>> >
>> > Component model call:
>> > glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
>> > family =
>> >      binomial(link = "logit"))
>> >
>> > Component models:
>> >           df   logLik    AICc  delta weight
>> > 123578     8 -4309.94 8635.89   0.00   0.14
>> > 1235789    9 -4309.22 8636.44   0.55   0.10
>> > 123789     8 -4310.52 8637.04   1.14   0.08
>> > 1235678    9 -4309.75 8637.50   1.61   0.06
>> > 12378      7 -4311.78 8637.57   1.67   0.06
>> > 1234578    9 -4309.79 8637.58   1.69   0.06
>> >
>> > Variables 1 and 2 represent the distance variables; from 3 to 8 land
>> cover
>> > variables, and 9 is a topographic variable. Weights seem to be very low,
>> > even if I average all those models as it seems to be common when delta
>> > values are low.
>>
>> Well as far as we can tell from this, variables 4-9 aren't doing much
>> (on the other hand, variables 1-3 seem to be in all of the top models
>> you've shown us -- although presumably there are a bunch more models
>> that are almost like these, and similar in weight, with other
>> permutations of [123] + [some combination of 456789] ...)
>>
>>
>> Even with this weights, I constructed GLMMs for each of the
>> > combinations, and the results were simmilar for all 6 combinations. Here
>> > are the results for the first one (GLMM + overdispersion + r-squared):
>> >
>> > Random effects:
>> >  Groups    Name        Variance Std.Dev.
>> >  ID.CODE_1 (Intercept) 13.02    3.608
>> > Number of obs: 32670, groups:  ID.CODE_1, 55
>> >
>> > Fixed effects:
>> >             Estimate Std. Error z value Pr(>|z|),
>> > (Intercept) -0.54891    0.51174  -1.073 0.283433
>> > 3       -0.22232    0.04059  -5.478 4.31e-08 ***
>> > 5       -0.05433    0.02837  -1.915 0.055460 .
>> > 7       -0.13108    0.02825  -4.640 3.49e-06 ***
>> > 8       -0.15864    0.08670  -1.830 0.067287 .
>> > 1         0.28438    0.02853   9.968  < 2e-16 ***
>> > 2         0.11531    0.03021   3.817 0.000135 ***
>> > Residual deviance: 0.256
>> > r.squaredGLMM():
>> >        R2m        R2c
>> > 0.01063077 0.80039950
>> > This is what I get from this analysis:
>> >
>> > 1) Variance and SD of the random effect seems fine (definitely better
>> than
>> > the "0" I got when using Study Areas as random effect);
>>
>>   yes -- SD of the random effects is much larger than any of the fixed
>> effects, which means that the differences among individuals are large
>> (presumably that means you have very different numbers of presences for
>> different number of individuals [all individuals sharing a common pool
>> of pseudo-absences ???)
>> >
>> > 2) Estimate values make sense from what I know of the species and the
>> > knowledge I have of the study areas;
>>
>>   Good!
>> >
>> > 3) Overdispersion values seem good, and R-squared values don't seem very
>> > good (at least when considering only fixed effects) but, as I read in
>> > several places, AIC and r-squared are not always in agreement.
>>
>>   Overdispersion is meaningless for binary data.
>> >
>> > 4) Weight values seem very low. Does it mean the models are not good?
>>
>>   It means there are many approximately equivalent models.  Nothing in
>> this output tells you very much about absolute goodness of fit (which is
>> tricky for binary data).
>> >
>> > Then what I did was construct a GLM ("glm()"), so no random effect was
>> > used. I used the same set of variables used in [1], and here are the
>> > results (only models of AICc lower than 2 represented):
>> >
>> > [2] Call:
>> > model.avg(object = dredge.glm_m1.1)
>> >
>> > Component model call:
>> > glm(formula = Used ~ <512 unique rhs>, family = binomial(link =
>> "logit"),
>> > data =
>> >      All_SA_Used_RP_Area_z)
>> >
>> > Component models:
>> >           df   logLik     AICc   delta weight
>> > 12345678   9 -9251.85 18521.70    0.00   0.52
>> > 123456789 10 -9251.77 18523.54    1.84   0.21
>> > 1345678    8 -9253.84 18523.69    1.99   0.19
>> >
>> > In this case, weight values are higher.
>> >
>> > Does this mean that it is better not to use a random effect? (I am not
>> sure
>> > I can compare GLMM with GLM results, correct me if I am doing wrong
>> > assumptions)
>>
>>   No.  You could do a likelihood ratio test with anova(), but note that
>> the AICc values for the glm() fits are 10,000 (!!) units higher than the
>> glmer fits.
>>
>>   While it will potentially greatly complicate your life, I think you
>> should at least *consider* interactions between your environment
>> variables and ID, i.e. allow for the possibility that different
>> individuals respond differently to habitat variation.
>>
>>   Ben Bolker
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
>
> --
> Craig DeMars, Ph.D.
> Postdoctoral Fellow
> Department of Biological Sciences
> University of Alberta
> Phone: 780-221-3971
>
>

	[[alternative HTML version deleted]]