[R-sig-ME] different aic and LL in glmer(lme4) and glimmix(SAS)?

Douglas Bates bates at stat.wisc.edu
Thu Jul 1 20:16:13 CEST 2010


On Thu, Jul 1, 2010 at 11:24 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Thu, Jul 1, 2010 at 11:03 AM, Jeffrey Evans
> <Jeffrey.Evans at dartmouth.edu> wrote:
>> Hello All,
>
>> I have read several posts related to this previously, but haven't found any
>> resolution yet. When running the same GLMM in glmer and in SAS PROC GLIMMIX,
>> both programs return comparable parameter estimates, but wildly different
>> likelihoods and AIC values.
>
>> In SAS I specify use of the Laplace approximation. In R, I believe this is
>> the default (no?).
>
>> What's the difference, and [how] can I reproduce the SAS -2ll in glmer?
>
> The difference is probably due to the way that the deviance is defined
> for the binomial family in R.  A glm family object is a list of
> functions and expressions.  One of the functions, called "dev.resids"
> has arguments y, mu and weights.  You can specify the response for a
> binomial family as the 0/1 responses or as a matrix with two columns
> as you did here.  When you use the two column specification the
> response y is transformed to the fraction of successes and the number
> of cases is incorporated in the weights.  It turns out that this is
> all the information necessary for obtaining the mle's of the
> parameters but it does not give the same deviance as you would get by
> listing the 0/1 responses.
>
> I'll write an example using the cbpp data from the lme4 package.

I enclose the example I promised.
>> Thanks,
>> Jeff
>>
>> \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
>>> R_GLMM = glmer(cbind(SdlFinal, SdlMax-SdlFinal) ~ lnsdlmaxd*lnadultssdld +
>>  (1|ID),data=sdlPCAdat,family="binomial")
>>> R_GLMM
>> Generalized linear mixed model fit by the Laplace approximation
>> Formula: cbind(SdlFinal, SdlMax - SdlFinal) ~ lnsdlmaxd * lnadultssdld +
>> (1 | ID)
>>   Data: sdlPCAdat
>>  AIC  BIC logLik deviance
>>  1150 1165   -570     1140        <------------------ this line!!
>> Random effects:
>>  Groups Name        Variance Std.Dev.
>>  ID     (Intercept) 1.2491   1.1176
>> Number of obs: 144, groups: ID, 48
>>
>> Fixed effects:
>>                       Estimate Std. Error z value Pr(>|z|)
>> (Intercept)             4.56964    0.43148  10.591  < 2e-16 ***
>> lnsdlmaxd              -0.65936    0.05686 -11.595  < 2e-16 ***
>> lnadultssdld           -0.64534    0.15861  -4.069 4.73e-05 ***
>> lnsdlmaxd:lnadultssdld  0.07393    0.02166   3.414  0.00064 ***
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Correlation of Fixed Effects:
>>            (Intr) lnsdlm lndlts
>> lnsdlmaxd   -0.923
>> lnadltssdld -0.461  0.479
>> lnsdlmxd:ln  0.482 -0.508 -0.994
>>
>>
>>  \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
>> title 'SAS GLMM';
>> proc glimmix data=sdlPCAdat ic=pq noitprint method=laplace;
>> class site id;
>> model sdlfinal/sdlmax = lnsdlmaxd|lnadultssdld/ solution dist=binomial;
>> random ID /;
>> covtest glm/wald;
>> run;
>>
>> //////////////////////////////////////////////////////////////////////
>>
>>                      SAS GLMM      19:36 Wednesday, June 30, 2010 88
>>
>>                   The GLIMMIX Procedure
>>
>>            Data Set           WORK.SDLPCADAT
>>            Response Variable (Events)  SdlFinal
>>            Response Variable (Trials)  SdlMax
>>            Response Distribution     Binomial
>>            Link Function         Logit
>>            Variance Function       Default
>>            Variance Matrix        Not blocked
>>            Estimation Technique     Maximum Likelihood
>>            Likelihood Approximation   Laplace
>>            Degrees of Freedom Method   Containment
>>
>>
>>
>>                  Optimization Information
>>
>>             Optimization Technique    Dual Quasi-Newton
>>             Parameters in Optimization  5
>>             Lower Boundaries       1
>>             Upper Boundaries       0
>>             Fixed Effects         Not Profiled
>>             Starting From         GLM estimates
>>
>>             Convergence criterion (GCONV=1E-8) satisfied.
>>
>>                     Fit Statistics
>>
>>               -2 Log Likelihood        1653.90  <------------------ this
>> line!!
>>               AIC (smaller is better)  1663.90
>>               AICC (smaller is better) 1664.33
>>               BIC (smaller is better)  1673.25
>>               CAIC (smaller is better) 1678.25
>>               HQIC (smaller is better) 1667.43
>>
>>
>>              Fit Statistics for Conditional Distribution
>>
>>              -2 log L(SdlFinal | r. effects)   1436.44
>>              Pearson Chi-Square          908.07
>>              Pearson Chi-Square / DF        6.31
>>
>>
>>                 Covariance Parameter Estimates
>>
>>            Cov         Standard     Z
>>            Parm  Estimate    Error   Value   Pr > Z
>>
>>            ID    1.2491   0.2746   4.55   <.0001
>>
>>
>>                 Solutions for Fixed Effects
>>
>>                              Standard
>>     Effect         Estimate    Error    DF  t Value  Pr > |t|
>>
>>     Intercept           4.5696   0.4333    47    10.55  <.0001
>>     lnsdlmaxd          -0.6594   0.05717   93   -11.53  <.0001
>>     lnadultssdld       -0.6453   0.1593    93   -4.05   0.0001
>>     lnsdlmaxd*lnadultssd0.07394  0.02174   93    3.40   0.0010
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
-------------- next part --------------

R version 2.11.1 (2010-05-31)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(lme4a)
Loading required package: Matrix
Loading required package: lattice

Attaching package: 'Matrix'

The following object(s) are masked from 'package:base':

    det

Loading required package: minqa
Loading required package: Rcpp

Attaching package: 'lme4a'

The following object(s) are masked from 'package:stats':

    AIC

> 
> ##' <description>
> ##' Expand the data frame for the matrix form of the binomial response
> ##' to the vector form of the response. 
> ##' <details>
> ##' In the formula for the glm and glmer functions with family =
> ##' binomial the response can be specified as a matrix with two
> ##' columns, corresponding to numbers of successes and failures.  The
> ##' parameter estimates for the model will be the same as those from
> ##' the vector response but the deviance itself is a different value.
> ##'
> ##' This function expands the model frame for the matrix response to
> ##' the equivalent frame for the vector response, replacing the
> ##' response matrix with a vector called y. of 1's and 0's
> ##' @title Expand binomial matrix response frame
> ##' @param fr model frame from a glm or glmer fitted model with
> ##' family=binomial and the matrix response specification.
> ##' @return expanded model frame with the response matrix replaced by
> ##' a vector of 1's and 0's.
> ##' @author Douglas Bates
> expandsu <- function(fr) {
+     stopifnot(is(fr, "data.frame"),
+               is(mm <- fr[[1]], "matrix"),
+               ncol(mm) == 2,
+               is.numeric(mm),
+               all(mm >= 0))
+     nr <- nrow(mm)
+     within(fr[, -1][rep.int(seq_len(nr), rowSums(mm)), ],
+            y. <- rep.int(rep.int(c(1,0), nr), as.vector(t(mm))))
+ }
> 
> str(cbpp)
'data.frame':	56 obs. of  4 variables:
 $ herd     : Factor w/ 15 levels "1","2","3","4",..: 1 1 1 1 2 2 2 3 3 3 ...
 $ incidence: num  2 3 4 0 3 1 1 8 2 0 ...
 $ size     : num  14 12 9 5 22 18 21 22 16 16 ...
 $ period   : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 1 2 3 ...
> (fm1 <- glmer(cbind(incidence, size - incidence) ~ 0 + period +
+               (1|herd), cbpp, binomial, verbose=1L))
npt = 3 , n =  1 
rhobeg =  0.2 , rhoend =  2e-07 
   0.020:   4:      100.209;0.600000 
  0.0020:   7:      100.154;0.649390 
 0.00020:  10:      100.152;0.641932 
 2.0e-05:  12:      100.152;0.641823 
 2.0e-06:  13:      100.152;0.641823 
 2.0e-07:  15:      100.152;0.641815 
At return
 18:     100.15189: 0.641815
npt = 11 , n =  5 
rhobeg =  0.5840305 , rhoend =  5.840305e-07 
   0.058:  11:      100.152;0.641815 -1.36047 -2.33665 -2.47155 -2.92015 
  0.0058:  18:      100.108;0.653041 -1.38121 -2.38874 -2.52675 -2.97967 
 0.00058:  31:      100.095;0.642982 -1.39633 -2.38831 -2.52555 -2.97656 
 5.8e-05:  48:      100.095;0.642327 -1.39893 -2.39134 -2.52748 -2.97945 
 5.8e-06:  57:      100.095;0.642392 -1.39894 -2.39126 -2.52758 -2.97924 
 5.8e-07:  66:      100.095;0.642393 -1.39894 -2.39125 -2.52758 -2.97924 
At return
 78:     100.09497: 0.642392 -1.39894 -2.39125 -2.52758 -2.97924
Generalized linear mixed model fit by maximum likelihood ['merMod']
 Family: binomial 
Formula: cbind(incidence, size - incidence) ~ 0 + period + (1 | herd) 
   Data: cbpp 
     AIC      BIC   logLik deviance 
110.0950 120.2217 -50.0475 100.0950 

Random effects:
 Groups Name        Variance Std.Dev.
 herd   (Intercept) 0.4127   0.6424  
Number of obs: 56, groups: herd, 15

Fixed effects:
        Estimate Std. Error z value
period1  -1.3989     0.2279  -6.138
period2  -2.3912     0.3103  -7.705
period3  -2.5276     0.3308  -7.641
period4  -2.9792     0.4327  -6.885

Correlation of Fixed Effects:
        perid1 perid2 perid3
period2 0.389               
period3 0.365  0.289        
period4 0.280  0.220  0.205 
> str(cbpp1 <- expandsu(model.frame(fm1)))
'data.frame':	842 obs. of  3 variables:
 $ period: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ herd  : Factor w/ 15 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ y.    : num  1 1 0 0 0 0 0 0 0 0 ...
> (fm1a <- glmer(y. ~ 0 + period + (1|herd), cbpp1, binomial,
+                verbose=1L))
npt = 3 , n =  1 
rhobeg =  0.2 , rhoend =  2e-07 
   0.020:   4:      555.118;0.600000 
  0.0020:   7:      555.062;0.649390 
 0.00020:  10:      555.060;0.641932 
 2.0e-05:  12:      555.060;0.641823 
 2.0e-06:  13:      555.060;0.641823 
 2.0e-07:  15:      555.060;0.641815 
At return
 17:     555.05991: 0.641815
npt = 11 , n =  5 
rhobeg =  0.5840305 , rhoend =  5.840305e-07 
   0.058:  11:      555.060;0.641815 -1.36047 -2.33665 -2.47155 -2.92015 
  0.0058:  18:      555.016;0.653041 -1.38121 -2.38874 -2.52675 -2.97967 
 0.00058:  31:      555.003;0.642982 -1.39633 -2.38831 -2.52555 -2.97656 
 5.8e-05:  48:      555.003;0.642327 -1.39893 -2.39134 -2.52748 -2.97945 
 5.8e-06:  57:      555.003;0.642392 -1.39894 -2.39126 -2.52758 -2.97924 
 5.8e-07:  66:      555.003;0.642393 -1.39894 -2.39125 -2.52758 -2.97924 
At return
 76:     555.00300: 0.642392 -1.39894 -2.39125 -2.52758 -2.97924
Generalized linear mixed model fit by maximum likelihood ['merMod']
 Family: binomial 
Formula: y. ~ 0 + period + (1 | herd) 
   Data: cbpp1 
      AIC       BIC    logLik  deviance 
 565.0030  588.6819 -277.5015  555.0030 

Random effects:
 Groups Name        Variance Std.Dev.
 herd   (Intercept) 0.4127   0.6424  
Number of obs: 842, groups: herd, 15

Fixed effects:
        Estimate Std. Error z value
period1  -1.3989     0.2279  -6.138
period2  -2.3912     0.3103  -7.705
period3  -2.5276     0.3308  -7.641
period4  -2.9792     0.4327  -6.885

Correlation of Fixed Effects:
        perid1 perid2 perid3
period2 0.389               
period3 0.365  0.289        
period4 0.280  0.220  0.205 
> 
> 
> proc.time()
   user  system elapsed 
  6.140   0.130   6.405 


More information about the R-sig-mixed-models mailing list