[R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?

Wed May 20 19:08:50 CEST 2009

Dear all,
I ran the model on the data "data" expanded in binary format, and I get the same results tahn
in tabular format( with the correct   number of observations).
Does it mean  that the sentence "the code is wrong" is relevant only for the calculs of number of observations ?
Thank you for reassuring me!

# dbin is the dataframe for the data in binary format

> summary(dbin)
       id       rep     
 3      :15   fail: 42  
 4      :14   suc :101  
 1      :13             
 9      :11             
 10     :10             
 2      :10             
 (Other):70  

> glmer(rep ~ 1 + (1 | id), family="binomial",data=dbin) -> m1.dbin
> summary(m1.dbin)
Generalized linear mixed model fit by the Laplace approximation 
Formula: rep ~ 1 + (1 | id) 
   Data: dbin 
   AIC   BIC logLik deviance
 176.8 182.7 -86.38    172.8
Random effects:
 Groups Name        Variance Std.Dev.
 id     (Intercept) 0.16121  0.40151 
Number of obs: 143, groups: id, 15

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.9058     0.2139   4.233  2.3e-05 ***

> ----------------------------------------
> From: Douglas Bates <bates at stat.wisc.edu>
> Sent: Wed May 20 14:24:16 CEST 2009
> To: Dale Steele <dale.w.steele at gmail.com>
> Subject: Re: [R-sig-ME] How to handle tabular form data in lmer/glmer without (or with) expanding the data into binary outcome form?
> 
> 
> On Tue, May 19, 2009 at 6:10 PM, Dale Steele <dale.w.steele at gmail.com> wrote:
> > A recent thread
> > <https://stat.ethz.ch/pipermail/r-help/2009-April/194750.html>
> > suggested the glmer can handle data in tabular form.  I'm working
> > through (not homework) the simple random effects example from Agresti,
> > An Introduction to Categorical Data Analysis, 2nd Ed. (pg 303 - 304)
> > and am having problems...
> 
> > # Data from Table 10.2 n_i: number of free throws, p_i: observed
> > proportion of successes
> 
> > id <- seq(1:15)
> > n <- c(13,10,15,14,6,10,10,4,11,10,8,9,9,8,6)
> > p <- c(.769, .9, .667, .643, .667, .9, .6, 1, .545, .9, .5, .889,
> > .778, .625, .167)
> > (success <- round(n * p))
> > fail <- n - success
> > data <- cbind(success, fail)
> 
> > # Model to fit: logit(pi_i) = mu_i + alpha
> 
> > library(lme4)
> > #  Model
> > m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
> > summary(m1)
> 
> nAGQ = 50 is overkill in this case but the model is sufficiently
> simple that it doesn't do much harm.  Gauss-Hermite quadrature without
> centering and scaling (which is the "adaptive" part) could require a
> huge number of quadrature points because most of the evaluations of
> the unscaled density are wasted.  In adaptive Gauss-Hermite
> quadrature, however, you get very close approximations for a small
> number of evaluations.  I would rarely use more than nAGQ = 9.  (Also,
> the number of quadrature points is always chosen to be odd so your 50
> will be increased to 51.  With an odd number of points you get one
> evaluation free - the one at a displacement of zero.)
> 
> > The code runs, and produces results similar, but not exactly what
> > Agresti gets (alpha_hat=0.908 and sigma_hat=0.422). Shouldn't the
> > 'number of obs' be 143 rather than 15?  Am I doing something wrong?
> 
> No, it's the code that is wrong.  At last summer's UseR Conference
> Andrew Gelman said that he was looking at the code for the glm
> function and he was frustrated that the code was so long and involved
> to be able to handle all the special cases.  This is a prime example
> of why the code gets complicated.
> 
> I know that it is convenient and natural to want to list the number of
> successes and failures instead of one row for each observation but
> doing that requires a massive amount of really ugly code to get it
> right.  (Look at the definitions of the various glm families some day.
>  There is this very peculiar "n" object squirreled away in a common
> environment of the family functions specifically for this one case -
> it is never used otherwise and, in fact, wasn't even defined in other
> cases until recently.)
> 
> All the structures in lmer/glmer/nlmer are based on the not
> unreasonable assumption that the model frame is a model frame, meaning
> that there is one observation per row.  Except that isn't the case
> here.  Getting the count right would mean writing a large amount of
> special case code to recognize this case and regenerate the original
> number of observations.  It can be done but it is messy and low on the
> priority list right now.
> 
> >  > summary(m1)
> >> m1 <- glmer(data ~ 1 + (1 | id), family="binomial", nAGQ=50)
> >> summary(m1)
> >
> > Generalized linear mixed model fit by the adaptive Gaussian Hermite
> > approximation
> > Formula: data ~ 1 + (1 | id)
> >  AIC   BIC logLik deviance
> >  32.8 34.21  -14.4     28.8
> > Random effects:
> >  Groups Name        Variance Std.Dev.
> >  id     (Intercept) 0.16258  0.40321
> > Number of obs: 15, groups: id, 15
> >
> > Fixed effects:
> >            Estimate Std. Error z value Pr(>|z|)
> > (Intercept)   0.9059     0.2142    4.23 2.34e-05 ***
> >
> > I tried to run the same model by expanding the tabular data. Easy
> > enough to expand the id variable.
> > #
> > (subj <- rep(id,n))
> >
> > However, I'm stuck on how to expand outcome variable for each player
> > and concatenate as a single vector...
> > # first player
> > oc1 <- rep(c(1,0), sf[1,])
> >
> > Appreciate any insight anyone may provide on how to use tabular data
> > directly and how to expand tubular data.  Thanks! --Dale
> 
> I enclose a modified version of your script to do that.  The general
> idea is to produce all the 1 responses, then produce all the zero
> responses then rbind them.