[R-sig-ME] Specifying outcome variable in binomial glmm: single responses vs cbind?
Ben Bolker
bbolker at gmail.com
Sat Jul 2 19:06:30 CEST 2016
On 16-07-01 07:37 PM, a y wrote:
> What is the difference between fitting a binomial glmm (without random item
> effects) in the following two ways?
>
> 1.
> Data formatted in the following way:
>
> (data_long)
> ID correct condition itemID
> 1 1 A i1
> 1 0 A i2
> 1 1 A i3
> 1 1 A i4
> 2 0 B i1
> 2 1 B i2
> 2 1 B i3
> 2 0 B i4
>
> Fitting a model without item random effects:
>
> glmer(correct ~ condition + (1|ID), family = binomial, data = data_long)
>
>
> 2.
> Data formatted this way (summing over the correct responses):
>
> (data_short)
> ID sum_correct condition itemID
> 1 3 A NA
> 2 2 B NA
>
> Fitting the following model, assuming there were only 4 items (I've seen
> dozens of examples like this):
> glmer(cbind(sum_correct, 4 - sum_correct) ~ condition + (1|ID), family =
> binomial, data = data_short)
>
> ---
> I figured these models should be identical, but in my experience they are
> very much not. What am I missing? When is the second (more) appropriate?
>
> Thanks for any help,
> Andrew
>
I believe they should give different likelihoods but identical
parameter estimates, *differences* among likelihoods (i.e. among
competing models fitted with the same data), etc.. That is,
disaggregating the data leads to an extra additive constant in the
log-likelihood. I would be very interested to see a counter-example to
that statement! In general, the second form should be quicker to fit,
provide residuals that are easier to interpret, etc..
More information about the R-sig-mixed-models
mailing list