[R-sig-ME] seeking input lme4::glmer with a gamma family: link = log or identity?
thierry@onkelinx @ending from inbo@be
Fri Jul 27 10:19:14 CEST 2018
If you run the model with Laplace approximation instead of Adaptive
Gauss-Hermite Quadrature, then the random effect yields has sensible
variance estimate. Maybe someone else (Ben Bolker?) can chime in and
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2018-07-26 22:00 GMT+02:00 Scott LaPoint <sdlapoint using gmail.com>:
> Thank you Thierry,
> The model below does converge and does not produce any warning messages,
> but the random effect variance and std dev are both = 0:
> mDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year + age*sex +
> (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
> glmerControl(optimizer = "bobyqa"))
> As I understand it, and please correct me if I'm wrong, it is possible
> (but perhaps unlikely) to have these values = 0. If so, I believe this
> implies that either the random effect variable is truly not variable or
> that the variance of the random effect is being captured by the other fixed
> effects. In my case, that might imply that any variation between birds is
> captured by year, age, or sex. So, assuming that logic is correct (and it
> may not be), then the following model would most likely show a variance and
> std dev > 0:
> mDist <- glmer(distance ~ CSs.lat + (1|id), data = birds, family =
> Gamma(link = log), nAGQ = 10, control = glmerControl(optimizer = "bobyqa"))
> But, it does not, and still shows a variance and std dev of 0. A quick
> boxplot of distance grouped by bird id shows both substantial variation
> across birds and at times within birds.
> Perhaps I'm still missing something? Is 137 observations really too few
> for a model with 1 fixed and 1 random effect variable?
> Apologies for my ignorance. I do appreciate the guidance while I learn to
> swim in the GLMM sea.
> Scott LaPoint
> Postdoctoral Researcher, Lamont-Doherty Earth Observatory, Columbia
> Associate Scientist, Max-Planck Institute for Ornithology
> skype: scott_lapoint
> twitter @sdlapoint
> On Wed, Jul 25, 2018 at 6:29 PM, Thierry Onkelinx <
> thierry.onkelinx using inbo.be> wrote:
>> Dear Scott,
>> Random effects model only information which is not captured by the fixed
>> effects. And the random effects are subject to shrinkage. Combine this with
>> a large number of fixed effect parameters, a small data set and unbalanced
>> repeated measurements. Then zero variance random effects and convergence
>> issues don't come as a surprise.
>> Bottom line: your model is too complex for the data. You'll need to
>> drop variables or make more observations (often not feasible, I know).
>> Using a different transformation/link/distribution won't solve any of
>> these issues.
>> Best regards,
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>> AND FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>> thierry.onkelinx using inbo.be
>> Havenlaan 88
>> <https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
>> 1000 Brussel
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>> 2018-07-25 22:05 GMT+02:00 Scott LaPoint <sdlapoint using gmail.com>:
>>> Thank you Paul, I appreciate your time. And, apologies if my
>>> is often incomplete.
>>> Hi Scott,
>>> > An incomplete answer…
>>> > > 1. Is a Gamma distribution best for my distance data? If so, which
>>> > > function is most appropriate? I explored two link functions:
>>> identity and
>>> > > log. I have concerns and see potential issues with both (see my
>>> > annotations
>>> > > in the reproducible example below.
>>> > I don’t know (I haven’t run your code) but I’ve always somehow managed
>>> > avoid gamma regression for strictly positive data by logging the
>>> > and fitting a model with normal errors.
>>> If possible, I'd rather not transform the raw data to facilitate
>>> interpretation of the coefficient estimates. I'm likely naive or
>>> misunderstanding something though. Log transforming the distance data
>>> produce a reasonably normal distribution. The following two models have
>>> very similar AIC, BIC, LogLik, etc. estimates and the p-values of the
>>> effects produce similar interpretations. However, the fixed effects
>>> estimates are quite different.
>>> gammaDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year +
>>> + (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
>>> glmerControl(optimizer = "bobyqa"))
>>> logGausDist <- glmer(log(distance) ~ CSs.lat + CSdirect + CSstart + year
>>> age*sex + (1|id), data = birds, family = gaussian(link = log), nAGQ = 10,
>>> control = glmerControl(optimizer = "bobyqa"))
>>> The interpretation from these two models are mostly the same: only
>>> latitude is a marginally significant predictor of bird migration
>>> > 2. If the log link is the best or most appropriate to use, then the
>>> > > summary(mDist) produces a sd of the random effect = 0 with the bobyqa
>>> > > optimizer. Switching to Nelder_Mead gives a reasonable sd, but
>>> throws a
>>> > > convergence warning.
>>> > (For clarity, I assume that by "sd of the random effect” you mean the
>>> > square root of the variance parameter that gauges residual inter-bird
>>> > variation in mean distance and not the SD of the estimate of that
>>> > parameter, which anyway isn’t output by glmer.)
>>> > Why is a random effect variance estimate of zero implausible? I would
>>> > trust a converged estimate over a non-converged estimate, regardless of
>>> > whether the estimate is zero. Also… you could compare the
>>> > using logLik() — you’d expect the converged fit to have a higher LL.
>>> > more general troubleshooting of convergence warnings:
>>> > http://rpubs.com/bbolker/lme4trouble1
>>> Yes, I believe your assumption is correct. In case I am wrong, I'm
>>> referring to these estimates from the summary(model) output:
>>> Random effects:
>>> Groups Name Variance Std.Dev.
>>> id (Intercept) 0.00000 0.0000
>>> Residual 0.02879 0.1697
>>> Number of obs: 137, groups: id, 79
>>> The reason I said that a Std.Dev. = 0 is implausible is because the
>>> ecologist in me says that there is no way that individual birds do not
>>> between each other (or even within for birds with multiple migration
>>> data). Am I misunderstanding the meaning of the Std.Dev here?
>>> > Another quick check I often do is to fit the non-converged model with
>>> > glmmTMB (which appears to be more robust than lme4), and compare
>>> > likelihoods and estimates with lme4.
>>> > A quick and dirty model fit assessment is to simulate from the fitted
>>> > model (which is as easy as simulate(my.fit)), and see if the simulated
>>> > responses look more or less like the real responses.
>>> > Good luck,
>>> > Paul
>>> [[alternative HTML version deleted]]
>>> R-sig-mixed-models using r-project.org mailing list
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models