[R-sig-ME] query on binomial glm of pollination experiment

Phillip Alday ph||||p@@|d@y @end|ng |rom mp|@n|
Fri Feb 21 13:55:54 CET 2020


And now again without PGP signature ....

Phillip



Hi all,

As the OP noted, part of the problem here is that it's not much data for
the model you want to estimate: 38 observations over 10 groups for one
variance component and five coefficients. For automan, you only have 4
observations and from the boxplot that category seems to behave quite
differently both in terms of mean and internal variability than the
other ones, so the huge standard errors are to be expected.

lme4 and glmmTMB actually deliver essentially the same model:

R> logLik(M3) #glmmTMB
'log Lik.' -117.8211 (df=6)

R> logLik(M2) # lm4 with the nlopt optimizer
'log Lik.' -117.8212 (df=6)

The gradient is part of the problem, but not in the sense of
convergence, but rather convergence tests and, to a lesser extent,
computation of standard errors. lme4 depends on finite-differences
approximations for the gradient and Hessian checks, which are computed
*after* gradient-free optimization, but glmmTMB uses the magic of
autodiff to have a much approximation to the gradient for both
optimization and tests.

(Note the lme4 warning about the finite-differences approximation
failing and falling back to a different estimate.)

If I recall correctly, rstanarm uses lightly regularizing priors (normal
with large sd), so you have something closer to L2-penalized (ridge)
regression. This bit of regularization then gets you better stability
and hence smaller standard errors for coefficients corresponding to few
observations. And of course, the convergence checks for MCMC don't
depend on gradients, even if Stan's HMC uses gradients to compute the
actual chains.

Fitting the same model in MixedModels.jl, we get essentially the same fit:

# sistrep$fruits_prop = with(sistrep, fruits / flowers)

julia> M4 = fit(MixedModel, @formula(fruits_prop ~ 1 + treatment +
(1|plant)), sistrep, Binomial(), LogitLink(),
wts=float.(sistrep.flowers), fast=false);

julia> loglikelihood(M4)

-117.82110003668843

Just for fun, we can crank up the quadrature points in Julia and see if
that changes anything

julia> @time M5 = fit(MixedModel, @formula(fruits_prop ~ 1 + treatment +
(1|plant)), sistrep,
Binomial(), LogitLink(), wts=float.(sistrep.flowers), fast=false,
nAGQ=9);
  0.899799 seconds (1.59 M allocations: 81.112 MiB, 5.96% gc
time)



julia>
loglikelihood(M5)


-117.82121127148437







julia> coef(M4) -
coef(M5)
5-element
Array{Float64,1}:


 6.14676620522836e-5


 0.4511892163672222


 2.2938186144294548e-5


 2.1950652206226273e-5


 9.892031197700213e-5

Only automan really changes and that change is minuscule compared to the
standard error.

My final comment would be: there is nothing wrong with this model per
se, but

1. You're always limited by your data, and with limited data, there's
not much to be done, unfortunately.

2. Regularization is the classic answer to helping out with such
problems. One way to achieve this is of course with priors in the
Bayesian framework. Note however that in your case that the estimates
essentially stay the same and only the error changes.

Best,

Phillip


On 21/02/2020 01:56, David Duffy wrote:
> I don’t have much time right now, but I was able to improve the model
> (including convergence) with a few small changes. I
>> think that having nofruits as a separate object may have been causing
a problem in the inner-workings of the software.
>> So I moved it to a column of sistrep. I also reordered the treatment
levels so that control is the baseline. Then the model
>> appears to run in lme4 with a warning or in glmmTMB without a
warning. It looks like some of the coefficients differ
>> between the two models.
>> sistrep$nofruits <- sistrep$flowers-sistrep$fruits
>> sistrep$treatment=factor(sistrep$treatment, levels=c("control",
"automan", "cruzman", "tul", "voile"))
>> library(glmmTMB)
>> M3 <- glmmTMB(cbind(fruits, nofruits) ~ treatment + (1|plant), data =
>> sistrep, family = "binomial")
>> summary(M3)
>>> M2 <- glmer(cbind(fruits, nofruits) ~ treatment + (1|plant), data =
>>> sistrep, family = "binomial")
> I just ran stan_glmer(), on the off-chance it would be better behaved.
> library(rstanarm)
> M4 <- stan_glmer(cbind(fruits, flowers-fruits) ~ treatment +
(1|plant), data=x, family=binomial())
>
> The treatment estimates (and SEs) were very similar to those from
glmer(), except for "automan", which I presume is what makes glmer unhappy.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models



More information about the R-sig-mixed-models mailing list