[R-sig-ME] seeking input lme4::glmer with a gamma family: link = log or identity?

Thu Jul 26 22:00:40 CEST 2018

Thank you Thierry,

The model below does converge and does not produce any warning messages,
but the random effect variance and std dev are both = 0:

mDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year + age*sex +
(1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
glmerControl(optimizer = "bobyqa"))

As I understand it, and please correct me if I'm wrong, it is possible (but
perhaps unlikely) to have these values = 0. If so, I believe this implies
that either the random effect variable is truly not variable or that the
variance of the random effect is being captured by the other fixed effects.
In my case, that might imply that any variation between birds is captured
by year, age, or sex. So, assuming that logic is correct (and it may not
be), then the following model would most likely show a variance and std dev
> 0:

mDist <- glmer(distance ~ CSs.lat + (1|id), data = birds,  family =
Gamma(link = log), nAGQ = 10, control = glmerControl(optimizer = "bobyqa"))

But, it does not, and still shows a variance and std dev of 0. A quick
boxplot of distance grouped by bird id shows both substantial variation
across birds and at times within birds.

Perhaps I'm still missing something? Is 137 observations really too few for
a model with 1 fixed and 1 random effect variable?

Apologies for my ignorance. I do appreciate the guidance while I learn to
swim in the GLMM sea.

scott

Scott LaPoint
Postdoctoral Researcher, Lamont-Doherty Earth Observatory, Columbia
University
Associate Scientist, Max-Planck Institute for Ornithology
skype: scott_lapoint
twitter @sdlapoint
scottlapoint.weebly.com

On Wed, Jul 25, 2018 at 6:29 PM, Thierry Onkelinx <thierry.onkelinx using inbo.be>
wrote:

> Dear Scott,
>
> Random effects model only information which is not captured by the fixed
> effects. And the random effects are subject to shrinkage. Combine this with
> a large number of fixed effect parameters, a small data set and unbalanced
> repeated measurements. Then zero variance random effects and convergence
> issues don't come as a surprise.
>
> Bottom line: your model is too complex for the data. You'll need to drop
> variables or make more observations (often not feasible, I know). Using a
> different transformation/link/distribution won't solve any of these
> issues.
>
> Best regards,
>
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be
> Havenlaan 88
> <https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
> 1000 Brussel
> www.inbo.be
>
> ////////////////////////////////////////////////////////////
> ///////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ////////////////////////////////////////////////////////////
> ///////////////////////////////
>
> <https://www.inbo.be>
>
> 2018-07-25 22:05 GMT+02:00 Scott LaPoint <sdlapoint using gmail.com>:
>
>> Thank you Paul, I appreciate your time. And, apologies if my understanding
>> is often incomplete.
>>
>>
>> Hi Scott,
>> >
>> > An incomplete answer…
>> >
>> > > 1. Is a Gamma distribution best for my distance data? If so, which
>> link
>> > > function is most appropriate? I explored two link functions: identity
>> and
>> > > log. I have concerns and see potential issues with both (see my
>> > annotations
>> > > in the reproducible example below.
>> >
>> > I don’t know (I haven’t run your code) but I’ve always somehow managed
>> to
>> > avoid gamma regression for strictly positive data by logging the
>> response
>> > and fitting a model with normal errors.
>> >
>>
>> If possible, I'd rather not transform the raw data to facilitate
>> interpretation of the coefficient estimates. I'm likely naive or
>> misunderstanding something though. Log transforming the distance data does
>> produce a reasonably normal distribution. The following two models have
>> very similar AIC, BIC, LogLik, etc. estimates and the p-values of the
>> fixed
>> effects produce similar interpretations. However, the fixed effects
>> estimates are quite different.
>>
>> gammaDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year +
>> age*sex
>> + (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
>> glmerControl(optimizer = "bobyqa"))
>> summary(gammaDist)
>>
>> logGausDist <- glmer(log(distance) ~ CSs.lat + CSdirect + CSstart + year +
>> age*sex + (1|id), data = birds, family = gaussian(link = log), nAGQ = 10,
>> control = glmerControl(optimizer = "bobyqa"))
>> summary(logGausDist)
>>
>> The interpretation from these two models are mostly the same: only
>> starting
>> latitude is a marginally significant predictor of bird migration distance.
>> Correct?
>>
>>
>> > 2.  If the log link is the best or most appropriate to use, then the
>> > > summary(mDist) produces a sd of the random effect = 0 with the bobyqa
>> > > optimizer. Switching to Nelder_Mead gives a reasonable sd, but throws
>> a
>> > > convergence warning.
>> >
>> > (For clarity, I assume that by "sd of the random effect” you mean the
>> > square root of the variance parameter that gauges residual inter-bird
>> > variation in mean distance and not the SD of the estimate of that
>> > parameter, which anyway isn’t output by glmer.)
>> >
>> > Why is a random effect variance estimate of zero implausible? I would
>> > trust a converged estimate over a non-converged estimate, regardless of
>> > whether the estimate is zero. Also… you could compare the
>> log-likelihoods
>> > using logLik() —  you’d expect the converged fit to have a higher LL.
>> For
>> > more general troubleshooting of convergence warnings:
>> > http://rpubs.com/bbolker/lme4trouble1
>>
>>
>> Yes, I believe your assumption is correct. In case I am wrong, I'm
>> referring to these estimates from the summary(model) output:
>> Random effects:
>>  Groups   Name        Variance Std.Dev.
>>  id       (Intercept) 0.00000  0.0000
>>  Residual             0.02879  0.1697
>> Number of obs: 137, groups:  id, 79
>>
>> The reason I said that a Std.Dev. = 0 is implausible is because the
>> ecologist in me says that there is no way that individual birds do not
>> vary
>> between each other (or even within for birds with multiple migration route
>> data). Am I misunderstanding the meaning of the Std.Dev here?
>>
>>
>> > Another quick check I often do is to fit the non-converged model with
>> > glmmTMB (which appears to be more robust than lme4), and compare
>> > likelihoods and estimates with lme4.
>> >
>> > A quick and dirty model fit assessment is to simulate from the fitted
>> > model (which is as easy as simulate(my.fit)), and see if the simulated
>> > responses look more or less like the real responses.
>> >
>> > Good luck,
>> > Paul
>> >
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>

	[[alternative HTML version deleted]]