[R-sig-ME] Specifying outcome variable in binomial glmm: single responses vs cbind?
Ben Bolker
bbolker at gmail.com
Mon Jul 4 22:10:11 CEST 2016
Really interesting (and somewhat disconcerting).
Running it with glmmTMB (which uses Laplace!) gives different results
from glmer with nAGQ=1 -- suggesting some issue not just with Laplace,
but with lme4's implementation thereof?? (I don't think the problem is
an optimization failure ...)
It makes *some* sense that Gauss-Hermite quadrature would be useful
for this case (since binary data is far from fitting a Normality
assumption), but that doesn't necessarily hold up to scrutiny since what
needs to be approximately Normal is not the likelihood per point, but
the likelihood per conditional mode [which should be the same, up to a
constant, for the aggregated and disaggregated data ...]
Doug Bates, if you're reading would you be willing to try this out
with MixedModels.jl ... ?
Ben Bolker
On 16-07-04 02:11 PM, Malcolm Fairbrother wrote:
> Hi Ben,
> This thread is relevant in this regard:
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q4/024241.html
> At least on my machine, I found a substantial difference in the
> parameter estimates. The second form seemed more reliable than the
> first, as you'll see from the thread.
> Do you get the same result?
> Best wishes,
> Malcolm
>
>
>
> Date: Sat, 2 Jul 2016 13:06:30 -0400
> From: Ben Bolker <bbolker at gmail.com <mailto:bbolker at gmail.com>>
> To: r-sig-mixed-models at r-project.org
> <mailto:r-sig-mixed-models at r-project.org>
> Subject: Re: [R-sig-ME] Specifying outcome variable in binomial glmm:
> single responses vs cbind?
>
>
>
> On 16-07-01 07:37 PM, a y wrote:
> > What is the difference between fitting a binomial glmm (without
> random item
> > effects) in the following two ways?
> >
> > 1.
> > Data formatted in the following way:
> >
> > (data_long)
> > ID correct condition itemID
> > 1 1 A i1
> > 1 0 A i2
> > 1 1 A i3
> > 1 1 A i4
> > 2 0 B i1
> > 2 1 B i2
> > 2 1 B i3
> > 2 0 B i4
> >
> > Fitting a model without item random effects:
> >
> > glmer(correct ~ condition + (1|ID), family = binomial, data =
> data_long)
> >
> >
> > 2.
> > Data formatted this way (summing over the correct responses):
> >
> > (data_short)
> > ID sum_correct condition itemID
> > 1 3 A NA
> > 2 2 B NA
> >
> > Fitting the following model, assuming there were only 4 items
> (I've seen
> > dozens of examples like this):
> > glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID),
> family =
> > binomial, data = data_short)
> >
> > ---
> > I figured these models should be identical, but in my experience
> they are
> > very much not. What am I missing? When is the second (more)
> appropriate?
> >
> > Thanks for any help,
> > Andrew
> >
>
> I believe they should give different likelihoods but identical
> parameter estimates, *differences* among likelihoods (i.e. among
> competing models fitted with the same data), etc.. That is,
> disaggregating the data leads to an extra additive constant in the
> log-likelihood. I would be very interested to see a counter-example to
> that statement! In general, the second form should be quicker to fit,
> provide residuals that are easier to interpret, etc..
>
More information about the R-sig-mixed-models
mailing list