[R-sig-ME] GLMM- relationship between AICc weight and random effects?

Mon Jul 11 18:40:20 CEST 2016

In an telemetry-based RSF using a random-intercept only GLMM, the standard
errors are calculated assuming the individual telemetry location (or
random/available location) is the sampling unit, not the animal. That is
partially the reason that some studies using this approach report
incredibly small p-values for their fixed effects when they have only, say,
a sample of < 30 animals.  These results are highly unlikely given the
small sample size of actual animals and that most wildlife populations will
have a degree of variability in selection among individuals. The
random-intercept only approach does not appropriately reflect this
individual-level variation.  A random-slope model will calculate slopes at
the individual animal level and therefore standard errors in this approach
will reflect individual-level variation. The drawback to these models is
you are often limited in the number of variables that you can specify as
random slopes.

A 2-stage approach is where you fit a GLM for each individual animal.
Population inferences are gained by averaging parameter estimates across
individuals.  You can account for differences in sample sizes per
individual by calculating averages that are weighted by sample size or the
inverse of the variance (giving more weight to individuals with more
precise estimates). I can guarantee that the resulting standard errors will
be more conservative (and more appropriate) than those calculated from a
random-intercept GLMM.

On Mon, Jul 11, 2016 at 4:01 AM, Teresa Oliveira <
mteresaoliveira92 at gmail.com> wrote:

> So, you are saying that the estimate values I get for the fixed effects,
> and SD, do not have in account the random intercept (in this case, ID)?
>
> What do you mean by "2-stage approaches"?
> Regarding random slopes, are they represented by, for instance, (Variable
> 1|ID)?
>
> I have seen a lot of studies where everyone uses GLMM to construct RSF,
> but I have also read that they are not appropriate for RSF, however no one
> seems sure about the right approach...
>
> Thank you very much for your help!!
> Teresa
>
> 2016-07-11 3:15 GMT+01:00 Craig DeMars <cdemars at ualberta.ca>:
>
>> I would add some caution when interpreting the estimated values when
>> using random-intercept only GLMMs for RSFs. The standard errors for the
>> fixed effects do not reflect the animal as the sampling unit (see
>> Schielzeth and Forstmeier 2009).  Thus, if your objective is to make
>> inference to the larger population of animals, the standard errors for the
>> fixed effects are far too narrow (i.e. overconfident). It is more
>> appropriate to use 2-stage approaches or GLMMs that use random slopes for
>> the variables of interest.
>>
>> Random-intercept only models are somewhat abused in the RSF literature in
>> this regard, in my opinion......
>>
>> On Sun, Jul 10, 2016 at 7:06 PM, Ben Bolker <bbolker at gmail.com> wrote:
>>
>>>
>>>
>>> On 16-07-09 07:20 PM, Teresa Oliveira wrote:
>>> > Dear list members,
>>> >
>>> > I am developing GLMM's in order to assess habitat selection (using
>>> GLMMs'
>>> > coeficients to construct Resource selection functions). I have
>>> (telemetry)
>>> > data from 5 study areas, and each area has a different number of
>>> > individuals monitored.
>>> >
>>> > To develop GLMM's, the dependend variable is binary (1-used locations;
>>> > 0-available locations), and I have a initial set of 14 continuous
>>> variables
>>> > (8 land cover variables; 2 distance variables, to artificial areas and
>>> > water sources; 4 topographic variables): a buffer was placed around
>>> each
>>> > location and the area of each land cover within that buffer was
>>> accounted
>>> > for; distances were measured from each point to the nearest feature,
>>> and
>>> > topographic variables were obtained using DEM rasters. I tested for
>>> > correlation using Spearman's Rank, so not all 14 were used in the
>>> GLMMs.
>>> > All variables were transformed using z-score.
>>> >
>>> > As random effect, I used individual ID. I thought at the beggining to
>>> use
>>> > study area as a random effect but I only had 5 levels and there was
>>> almost
>>> > no variance when that random effect was used.
>>> >
>>> > I constructed a GLMM with 9 variables (not correlated) and a random
>>> effect,
>>> > then used "dredge()" function and "model.avg(dredge)" to sort models
>>> by AIC
>>> > values. This was the result (only models of AICc lower than 2
>>> represented):
>>> >
>>> > [1]Call:
>>> > model.avg(object = dredge.m1.1)
>>> >
>>> > Component model call:
>>> > glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
>>> > family =
>>> >      binomial(link = "logit"))
>>> >
>>> > Component models:
>>> >           df   logLik    AICc  delta weight
>>> > 123578     8 -4309.94 8635.89   0.00   0.14
>>> > 1235789    9 -4309.22 8636.44   0.55   0.10
>>> > 123789     8 -4310.52 8637.04   1.14   0.08
>>> > 1235678    9 -4309.75 8637.50   1.61   0.06
>>> > 12378      7 -4311.78 8637.57   1.67   0.06
>>> > 1234578    9 -4309.79 8637.58   1.69   0.06
>>> >
>>> > Variables 1 and 2 represent the distance variables; from 3 to 8 land
>>> cover
>>> > variables, and 9 is a topographic variable. Weights seem to be very
>>> low,
>>> > even if I average all those models as it seems to be common when delta
>>> > values are low.
>>>
>>> Well as far as we can tell from this, variables 4-9 aren't doing much
>>> (on the other hand, variables 1-3 seem to be in all of the top models
>>> you've shown us -- although presumably there are a bunch more models
>>> that are almost like these, and similar in weight, with other
>>> permutations of [123] + [some combination of 456789] ...)
>>>
>>>
>>> Even with this weights, I constructed GLMMs for each of the
>>> > combinations, and the results were simmilar for all 6 combinations.
>>> Here
>>> > are the results for the first one (GLMM + overdispersion + r-squared):
>>> >
>>> > Random effects:
>>> >  Groups    Name        Variance Std.Dev.
>>> >  ID.CODE_1 (Intercept) 13.02    3.608
>>> > Number of obs: 32670, groups:  ID.CODE_1, 55
>>> >
>>> > Fixed effects:
>>> >             Estimate Std. Error z value Pr(>|z|),
>>> > (Intercept) -0.54891    0.51174  -1.073 0.283433
>>> > 3       -0.22232    0.04059  -5.478 4.31e-08 ***
>>> > 5       -0.05433    0.02837  -1.915 0.055460 .
>>> > 7       -0.13108    0.02825  -4.640 3.49e-06 ***
>>> > 8       -0.15864    0.08670  -1.830 0.067287 .
>>> > 1         0.28438    0.02853   9.968  < 2e-16 ***
>>> > 2         0.11531    0.03021   3.817 0.000135 ***
>>> > Residual deviance: 0.256
>>> > r.squaredGLMM():
>>> >        R2m        R2c
>>> > 0.01063077 0.80039950
>>> > This is what I get from this analysis:
>>> >
>>> > 1) Variance and SD of the random effect seems fine (definitely better
>>> than
>>> > the "0" I got when using Study Areas as random effect);
>>>
>>>   yes -- SD of the random effects is much larger than any of the fixed
>>> effects, which means that the differences among individuals are large
>>> (presumably that means you have very different numbers of presences for
>>> different number of individuals [all individuals sharing a common pool
>>> of pseudo-absences ???)
>>> >
>>> > 2) Estimate values make sense from what I know of the species and the
>>> > knowledge I have of the study areas;
>>>
>>>   Good!
>>> >
>>> > 3) Overdispersion values seem good, and R-squared values don't seem
>>> very
>>> > good (at least when considering only fixed effects) but, as I read in
>>> > several places, AIC and r-squared are not always in agreement.
>>>
>>>   Overdispersion is meaningless for binary data.
>>> >
>>> > 4) Weight values seem very low. Does it mean the models are not good?
>>>
>>>   It means there are many approximately equivalent models.  Nothing in
>>> this output tells you very much about absolute goodness of fit (which is
>>> tricky for binary data).
>>> >
>>> > Then what I did was construct a GLM ("glm()"), so no random effect was
>>> > used. I used the same set of variables used in [1], and here are the
>>> > results (only models of AICc lower than 2 represented):
>>> >
>>> > [2] Call:
>>> > model.avg(object = dredge.glm_m1.1)
>>> >
>>> > Component model call:
>>> > glm(formula = Used ~ <512 unique rhs>, family = binomial(link =
>>> "logit"),
>>> > data =
>>> >      All_SA_Used_RP_Area_z)
>>> >
>>> > Component models:
>>> >           df   logLik     AICc   delta weight
>>> > 12345678   9 -9251.85 18521.70    0.00   0.52
>>> > 123456789 10 -9251.77 18523.54    1.84   0.21
>>> > 1345678    8 -9253.84 18523.69    1.99   0.19
>>> >
>>> > In this case, weight values are higher.
>>> >
>>> > Does this mean that it is better not to use a random effect? (I am not
>>> sure
>>> > I can compare GLMM with GLM results, correct me if I am doing wrong
>>> > assumptions)
>>>
>>>   No.  You could do a likelihood ratio test with anova(), but note that
>>> the AICc values for the glm() fits are 10,000 (!!) units higher than the
>>> glmer fits.
>>>
>>>   While it will potentially greatly complicate your life, I think you
>>> should at least *consider* interactions between your environment
>>> variables and ID, i.e. allow for the possibility that different
>>> individuals respond differently to habitat variation.
>>>
>>>   Ben Bolker
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>>
>> --
>> Craig DeMars, Ph.D.
>> Postdoctoral Fellow
>> Department of Biological Sciences
>> University of Alberta
>> Phone: 780-221-3971
>>
>>
>

-- 
Craig DeMars, Ph.D.
Postdoctoral Fellow
Department of Biological Sciences
University of Alberta
Phone: 780-221-3971

	[[alternative HTML version deleted]]