[R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?

Wed May 20 19:16:03 CEST 2009

On Wed, May 20, 2009 at 12:08 PM, Robert ESPESSER
<Robert.Espesser at lpl-aix.fr> wrote:
> Dear all,
> I ran the model on the data "data" expanded in binary format, and I get the same results tahn
> in tabular format( with the correct   number of observations).
> Does it mean  that the sentence "the code is wrong" is relevant only for the calculs of number of observations ?

Yes, that is what I meant.  Sorry for alarming you.

(Well, actually there is another sense in which the calculations on
the collapsed data give a different result.  The calculated values of
the deviance, AIC and BIC are different for the extended and collapsed
data set and they should be the same.)

> Thank you for reassuring me!
>
> # dbin is the dataframe for the data in binary format
>
>> summary(dbin)
>       id       rep
>  3      :15   fail: 42
>  4      :14   suc :101
>  1      :13
>  9      :11
>  10     :10
>  2      :10
>  (Other):70
>
>
>> glmer(rep ~ 1 + (1 | id), family="binomial",data=dbin) -> m1.dbin
>> summary(m1.dbin)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: rep ~ 1 + (1 | id)
>   Data: dbin
>   AIC   BIC logLik deviance
>  176.8 182.7 -86.38    172.8
> Random effects:
>  Groups Name        Variance Std.Dev.
>  id     (Intercept) 0.16121  0.40151
> Number of obs: 143, groups: id, 15
>
> Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept)   0.9058     0.2139   4.233  2.3e-05 ***
>
>
>
>> ----------------------------------------
>> From: Douglas Bates <bates at stat.wisc.edu>
>> Sent: Wed May 20 14:24:16 CEST 2009
>> To: Dale Steele <dale.w.steele at gmail.com>
>> Subject: Re: [R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?
>>
>>
>> On Tue, May 19, 2009 at 6:10 PM, Dale Steele <dale.w.steele at gmail.com> wrote:
>> > A recent thread
>> > <https://stat.ethz.ch/pipermail/r-help/2009-April/194750.html>
>> > suggested the glmer can handle data in tabular form.  I'm working
>> > through (not homework) the simple random effects example from Agresti,
>> > An Introduction to Categorical Data Analysis, 2nd Ed. (pg 303 - 304)
>> > and am having problems...
>>
>> > # Data from Table 10.2 n_i: number of free throws, p_i: observed
>> > proportion of successes
>>
>> > id <- seq(1:15)
>> > n <- c(13,10,15,14,6,10,10,4,11,10,8,9,9,8,6)
>> > p <- c(.769, .9, .667, .643, .667, .9, .6, 1, .545, .9, .5, .889,
>> > .778, .625, .167)
>> > (success <- round(n * p))
>> > fail <- n - success
>> > data <- cbind(success, fail)
>>
>> > # Model to fit: logit(pi_i) = mu_i + alpha
>>
>> > library(lme4)
>> > #  Model
>> > m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
>> > summary(m1)
>>
>> nAGQ = 50 is overkill in this case but the model is sufficiently
>> simple that it doesn't do much harm.  Gauss-Hermite quadrature without
>> centering and scaling (which is the "adaptive" part) could require a
>> huge number of quadrature points because most of the evaluations of
>> the unscaled density are wasted.  In adaptive Gauss-Hermite
>> quadrature, however, you get very close approximations for a small
>> number of evaluations.  I would rarely use more than nAGQ = 9.  (Also,
>> the number of quadrature points is always chosen to be odd so your 50
>> will be increased to 51.  With an odd number of points you get one
>> evaluation free - the one at a displacement of zero.)
>>
>> > The code runs, and produces results similar, but not exactly what
>> > Agresti gets (alpha_hat=0.908 and sigma_hat=0.422). Shouldn't the
>> > 'number of obs' be 143 rather than 15?  Am I doing something wrong?
>>
>> No, it's the code that is wrong.  At last summer's UseR Conference
>> Andrew Gelman said that he was looking at the code for the glm
>> function and he was frustrated that the code was so long and involved
>> to be able to handle all the special cases.  This is a prime example
>> of why the code gets complicated.
>>
>> I know that it is convenient and natural to want to list the number of
>> successes and failures instead of one row for each observation but
>> doing that requires a massive amount of really ugly code to get it
>> right.  (Look at the definitions of the various glm families some day.
>>  There is this very peculiar "n" object squirreled away in a common
>> environment of the family functions specifically for this one case -
>> it is never used otherwise and, in fact, wasn't even defined in other
>> cases until recently.)
>>
>> All the structures in lmer/glmer/nlmer are based on the not
>> unreasonable assumption that the model frame is a model frame, meaning
>> that there is one observation per row.  Except that isn't the case
>> here.  Getting the count right would mean writing a large amount of
>> special case code to recognize this case and regenerate the original
>> number of observations.  It can be done but it is messy and low on the
>> priority list right now.
>>
>> >  > summary(m1)
>> >> m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
>> >> summary(m1)
>> >
>> > Generalized linear mixed model fit by the adaptive Gaussian Hermite
>> > approximation
>> > Formula: data ~ 1 + (1 | id)
>> >  AIC   BIC logLik deviance
>> >  32.8 34.21  -14.4     28.8
>> > Random effects:
>> >  Groups Name        Variance Std.Dev.
>> >  id     (Intercept) 0.16258  0.40321
>> > Number of obs: 15, groups: id, 15
>> >
>> > Fixed effects:
>> >            Estimate Std. Error z value Pr(>|z|)
>> > (Intercept)   0.9059     0.2142    4.23 2.34e-05 ***
>> >
>> > I tried to run the same model by expanding the tabular data. Easy
>> > enough to expand the id variable.
>> > #
>> > (subj <- rep(id,n))
>> >
>> > However, I'm stuck on how to expand outcome variable for each player
>> > and concatenate as a single vector...
>> > # first player
>> > oc1 <- rep(c(1,0), sf[1,])
>> >
>> > Appreciate any insight anyone may provide on how to use tabular data
>> > directly and how to expand tubular data.  Thanks! --Dale
>>
>> I enclose a modified version of your script to do that.  The general
>> idea is to produce all the 1 responses, then produce all the zero
>> responses then rbind them.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>