[R-sig-ME] Specifying outcome variable in binomial glmm: single responses vs cbind?

Mon Jul 4 22:10:11 CEST 2016

  Really interesting (and somewhat disconcerting).

  Running it with glmmTMB (which uses Laplace!) gives different results
from glmer with nAGQ=1 -- suggesting some issue not just with Laplace,
but with lme4's implementation thereof?? (I don't think the problem is
an optimization failure ...)
   It makes *some* sense that Gauss-Hermite quadrature would be useful
for this case (since binary data is far from fitting a Normality
assumption), but that doesn't necessarily hold up to scrutiny since what
needs to be approximately Normal is not the likelihood per point, but
the likelihood per conditional mode [which should be the same, up to a
constant, for the aggregated and disaggregated data ...]

  Doug Bates, if you're reading would you be willing to try this out
with MixedModels.jl ... ?

  Ben Bolker

On 16-07-04 02:11 PM, Malcolm Fairbrother wrote:
> Hi Ben,
> This thread is relevant in this regard:
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q4/024241.html
> At least on my machine, I found a substantial difference in the
> parameter estimates. The second form seemed more reliable than the
> first, as you'll see from the thread.
> Do you get the same result?
> Best wishes,
> Malcolm
> 
> 
> 
>     Date: Sat, 2 Jul 2016 13:06:30 -0400
>     From: Ben Bolker <bbolker at gmail.com <mailto:bbolker at gmail.com>>
>     To: r-sig-mixed-models at r-project.org
>     <mailto:r-sig-mixed-models at r-project.org>
>     Subject: Re: [R-sig-ME] Specifying outcome variable in binomial glmm:
>             single responses vs cbind?
> 
> 
> 
>     On 16-07-01 07:37 PM, a y wrote:
>     > What is the difference between fitting a binomial glmm (without
>     random item
>     > effects) in the following two ways?
>     >
>     > 1.
>     > Data formatted in the following way:
>     >
>     > (data_long)
>     > ID    correct    condition    itemID
>     > 1      1             A               i1
>     > 1      0             A               i2
>     > 1      1             A               i3
>     > 1      1             A               i4
>     > 2      0             B               i1
>     > 2      1             B               i2
>     > 2      1             B               i3
>     > 2      0             B               i4
>     >
>     > Fitting a model without item random effects:
>     >
>     > glmer(correct ~ condition + (1|ID), family = binomial, data =
>     data_long)
>     >
>     >
>     > 2.
>     > Data formatted this way (summing over the correct responses):
>     >
>     > (data_short)
>     > ID     sum_correct    condition     itemID
>     > 1       3                      A                NA
>     > 2       2                      B                NA
>     >
>     > Fitting the following model, assuming there were only 4 items 
>     (I've seen
>     > dozens of examples like this):
>     > glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID),
>     family =
>     > binomial, data = data_short)
>     >
>     > ---
>     > I figured these models should be identical, but in my experience
>     they are
>     > very much not. What am I missing? When is the second (more)
>     appropriate?
>     >
>     > Thanks for any help,
>     > Andrew
>     >
> 
>       I believe they should give different likelihoods but identical
>     parameter estimates, *differences* among likelihoods (i.e. among
>     competing models fitted with the same data), etc..  That is,
>     disaggregating the data leads to an extra additive constant in the
>     log-likelihood. I would be very interested to see a counter-example to
>     that statement!  In general, the second form should be quicker to fit,
>     provide residuals that are easier to interpret, etc..
>