[R-sig-ME] seeking input lme4::glmer with a gamma family: link = log or identity?

Thierry Onkelinx thierry@onkelinx @ending from inbo@be
Thu Jul 26 00:29:15 CEST 2018


Dear Scott,

Random effects model only information which is not captured by the fixed
effects. And the random effects are subject to shrinkage. Combine this with
a large number of fixed effect parameters, a small data set and unbalanced
repeated measurements. Then zero variance random effects and convergence
issues don't come as a surprise.

​Bottom line: ​your model is too complex for the data. You'll need to drop
variables or make more observations (often not feasible, I know). Using a
different transformation/link/distribution won't solve any of these issues.

Best regards,


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

2018-07-25 22:05 GMT+02:00 Scott LaPoint <sdlapoint using gmail.com>:

> Thank you Paul, I appreciate your time. And, apologies if my understanding
> is often incomplete.
>
>
> Hi Scott,
> >
> > An incomplete answer…
> >
> > > 1. Is a Gamma distribution best for my distance data? If so, which link
> > > function is most appropriate? I explored two link functions: identity
> and
> > > log. I have concerns and see potential issues with both (see my
> > annotations
> > > in the reproducible example below.
> >
> > I don’t know (I haven’t run your code) but I’ve always somehow managed to
> > avoid gamma regression for strictly positive data by logging the response
> > and fitting a model with normal errors.
> >
>
> If possible, I'd rather not transform the raw data to facilitate
> interpretation of the coefficient estimates. I'm likely naive or
> misunderstanding something though. Log transforming the distance data does
> produce a reasonably normal distribution. The following two models have
> very similar AIC, BIC, LogLik, etc. estimates and the p-values of the fixed
> effects produce similar interpretations. However, the fixed effects
> estimates are quite different.
>
> gammaDist <- glmer(distance ~ CSs.lat + CSdirect + CSstart + year + age*sex
> + (1|id), data = birds, family = Gamma(link = log), nAGQ = 10, control =
> glmerControl(optimizer = "bobyqa"))
> summary(gammaDist)
>
> logGausDist <- glmer(log(distance) ~ CSs.lat + CSdirect + CSstart + year +
> age*sex + (1|id), data = birds, family = gaussian(link = log), nAGQ = 10,
> control = glmerControl(optimizer = "bobyqa"))
> summary(logGausDist)
>
> The interpretation from these two models are mostly the same: only starting
> latitude is a marginally significant predictor of bird migration distance.
> Correct?
>
>
> > 2.  If the log link is the best or most appropriate to use, then the
> > > summary(mDist) produces a sd of the random effect = 0 with the bobyqa
> > > optimizer. Switching to Nelder_Mead gives a reasonable sd, but throws a
> > > convergence warning.
> >
> > (For clarity, I assume that by "sd of the random effect” you mean the
> > square root of the variance parameter that gauges residual inter-bird
> > variation in mean distance and not the SD of the estimate of that
> > parameter, which anyway isn’t output by glmer.)
> >
> > Why is a random effect variance estimate of zero implausible? I would
> > trust a converged estimate over a non-converged estimate, regardless of
> > whether the estimate is zero. Also… you could compare the log-likelihoods
> > using logLik() —  you’d expect the converged fit to have a higher LL. For
> > more general troubleshooting of convergence warnings:
> > http://rpubs.com/bbolker/lme4trouble1
>
>
> Yes, I believe your assumption is correct. In case I am wrong, I'm
> referring to these estimates from the summary(model) output:
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  id       (Intercept) 0.00000  0.0000
>  Residual             0.02879  0.1697
> Number of obs: 137, groups:  id, 79
>
> The reason I said that a Std.Dev. = 0 is implausible is because the
> ecologist in me says that there is no way that individual birds do not vary
> between each other (or even within for birds with multiple migration route
> data). Am I misunderstanding the meaning of the Std.Dev here?
>
>
> > Another quick check I often do is to fit the non-converged model with
> > glmmTMB (which appears to be more robust than lme4), and compare
> > likelihoods and estimates with lme4.
> >
> > A quick and dirty model fit assessment is to simulate from the fitted
> > model (which is as easy as simulate(my.fit)), and see if the simulated
> > responses look more or less like the real responses.
> >
> > Good luck,
> > Paul
> >
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list