[R-sig-ME] seeking input lme4::glmer with a gamma family: link = log or identity?

Fri Jul 27 10:19:14 CEST 2018

Dear Scott,

If you run the model with Laplace approximation instead of Adaptive
Gauss-Hermite Quadrature, then the random effect yields has sensible
variance estimate. Maybe someone else (Ben Bolker?) can chime in and
explain why.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

2018-07-26 22:00 GMT+02:00 Scott LaPoint <sdlapoint using gmail.com>:

> Thank you Thierry,
>
> The model below does converge and does not produce any warning messages,
> but the random effect variance and std dev are both = 0:
>
> mDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year + age*sex +
> (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
> glmerControl(optimizer = "bobyqa"))
>
> As I understand it, and please correct me if I'm wrong, it is possible
> (but perhaps unlikely) to have these values = 0. If so, I believe this
> implies that either the random effect variable is truly not variable or
> that the variance of the random effect is being captured by the other fixed
> effects. In my case, that might imply that any variation between birds is
> captured by year, age, or sex. So, assuming that logic is correct (and it
> may not be), then the following model would most likely show a variance and
> std dev > 0:
>
> mDist <- glmer(distance ~ CSs.lat + (1|id), data = birds,  family =
> Gamma(link = log), nAGQ = 10, control = glmerControl(optimizer = "bobyqa"))
>
> But, it does not, and still shows a variance and std dev of 0. A quick
> boxplot of distance grouped by bird id shows both substantial variation
> across birds and at times within birds.
>
> Perhaps I'm still missing something? Is 137 observations really too few
> for a model with 1 fixed and 1 random effect variable?
>
> Apologies for my ignorance. I do appreciate the guidance while I learn to
> swim in the GLMM sea.
>
> scott
>
> Scott LaPoint
> Postdoctoral Researcher, Lamont-Doherty Earth Observatory, Columbia
> University
> Associate Scientist, Max-Planck Institute for Ornithology
> skype: scott_lapoint
> twitter @sdlapoint
> scottlapoint.weebly.com
>
> On Wed, Jul 25, 2018 at 6:29 PM, Thierry Onkelinx <
> thierry.onkelinx using inbo.be> wrote:
>
>> Dear Scott,
>>
>> Random effects model only information which is not captured by the fixed
>> effects. And the random effects are subject to shrinkage. Combine this with
>> a large number of fixed effect parameters, a small data set and unbalanced
>> repeated measurements. Then zero variance random effects and convergence
>> issues don't come as a surprise.
>>
>> Bottom line: your model is too complex for the data. You'll need to
>> drop variables or make more observations (often not feasible, I know).
>> Using a different transformation/link/distribution won't solve any of
>> these issues.
>>
>> Best regards,
>>
>>
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>>
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>> AND FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>> thierry.onkelinx using inbo.be
>> Havenlaan 88
>> <https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
>> 1000 Brussel
>> www.inbo.be
>>
>> ////////////////////////////////////////////////////////////
>> ///////////////////////////////
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>> ////////////////////////////////////////////////////////////
>> ///////////////////////////////
>>
>> <https://www.inbo.be>
>>
>> 2018-07-25 22:05 GMT+02:00 Scott LaPoint <sdlapoint using gmail.com>:
>>
>>> Thank you Paul, I appreciate your time. And, apologies if my
>>> understanding
>>> is often incomplete.
>>>
>>>
>>> Hi Scott,
>>> >
>>> > An incomplete answer…
>>> >
>>> > > 1. Is a Gamma distribution best for my distance data? If so, which
>>> link
>>> > > function is most appropriate? I explored two link functions:
>>> identity and
>>> > > log. I have concerns and see potential issues with both (see my
>>> > annotations
>>> > > in the reproducible example below.
>>> >
>>> > I don’t know (I haven’t run your code) but I’ve always somehow managed
>>> to
>>> > avoid gamma regression for strictly positive data by logging the
>>> response
>>> > and fitting a model with normal errors.
>>> >
>>>
>>> If possible, I'd rather not transform the raw data to facilitate
>>> interpretation of the coefficient estimates. I'm likely naive or
>>> misunderstanding something though. Log transforming the distance data
>>> does
>>> produce a reasonably normal distribution. The following two models have
>>> very similar AIC, BIC, LogLik, etc. estimates and the p-values of the
>>> fixed
>>> effects produce similar interpretations. However, the fixed effects
>>> estimates are quite different.
>>>
>>> gammaDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year +
>>> age*sex
>>> + (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
>>> glmerControl(optimizer = "bobyqa"))
>>> summary(gammaDist)
>>>
>>> logGausDist <- glmer(log(distance) ~ CSs.lat + CSdirect + CSstart + year
>>> +
>>> age*sex + (1|id), data = birds, family = gaussian(link = log), nAGQ = 10,
>>> control = glmerControl(optimizer = "bobyqa"))
>>> summary(logGausDist)
>>>
>>> The interpretation from these two models are mostly the same: only
>>> starting
>>> latitude is a marginally significant predictor of bird migration
>>> distance.
>>> Correct?
>>>
>>>
>>> > 2.  If the log link is the best or most appropriate to use, then the
>>> > > summary(mDist) produces a sd of the random effect = 0 with the bobyqa
>>> > > optimizer. Switching to Nelder_Mead gives a reasonable sd, but
>>> throws a
>>> > > convergence warning.
>>> >
>>> > (For clarity, I assume that by "sd of the random effect” you mean the
>>> > square root of the variance parameter that gauges residual inter-bird
>>> > variation in mean distance and not the SD of the estimate of that
>>> > parameter, which anyway isn’t output by glmer.)
>>> >
>>> > Why is a random effect variance estimate of zero implausible? I would
>>> > trust a converged estimate over a non-converged estimate, regardless of
>>> > whether the estimate is zero. Also… you could compare the
>>> log-likelihoods
>>> > using logLik() —  you’d expect the converged fit to have a higher LL.
>>> For
>>> > more general troubleshooting of convergence warnings:
>>> > http://rpubs.com/bbolker/lme4trouble1
>>>
>>>
>>> Yes, I believe your assumption is correct. In case I am wrong, I'm
>>> referring to these estimates from the summary(model) output:
>>> Random effects:
>>>  Groups   Name        Variance Std.Dev.
>>>  id       (Intercept) 0.00000  0.0000
>>>  Residual             0.02879  0.1697
>>> Number of obs: 137, groups:  id, 79
>>>
>>> The reason I said that a Std.Dev. = 0 is implausible is because the
>>> ecologist in me says that there is no way that individual birds do not
>>> vary
>>> between each other (or even within for birds with multiple migration
>>> route
>>> data). Am I misunderstanding the meaning of the Std.Dev here?
>>>
>>>
>>> > Another quick check I often do is to fit the non-converged model with
>>> > glmmTMB (which appears to be more robust than lme4), and compare
>>> > likelihoods and estimates with lme4.
>>> >
>>> > A quick and dirty model fit assessment is to simulate from the fitted
>>> > model (which is as easy as simulate(my.fit)), and see if the simulated
>>> > responses look more or less like the real responses.
>>> >
>>> > Good luck,
>>> > Paul
>>> >
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>

	[[alternative HTML version deleted]]