[R-sig-ME] Poor convergence when using binomial response

Wed May 23 18:49:05 CEST 2007

Thanks for sending the example, Sundar, and also for the idea of the
expand.bin function.

There is a similar example in the lme4 package using the cbpp data
set.  I modified your expand.bin function to apply to that example
with the enclosed results.  As you can see the parameter estimates in
this example are not exactly the same for the "count out of" form
versus the "expanded binary responses" form but they are reasonably
close.  In general I would recommend using cbind(successes, failures)
as the response rather than messing around with the weights.  I always
manage to confuse myself about what the weights should be and need to
go back to the cbind(successes, failures) form to check.

As always, it is a good idea to set control = list(msVerbose = 1) to
see exactly what is going on with the parameter estimates during the
iterations.

I hope this helps.  If necessary I will look in more detail at your
example but I hope this is enough to get you started.

On 5/22/07, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
> Hi,
>
> I'm seeing poor convergence of a model using the binomial family. In one
> case, I have a binomial response for which I use the 'weights' argument.
> In a second case, I expand the binomial response to Bernoulli trials and
> refit. Theoretically, they should be the same model. However, that does
> not appear to be the case because the residual scale for the binomial
> case is unbelievably large. Could someone (Prof. Bates) please help
> diagnose this problem?
>
> And thank you so much for such wonderful software. I truly appreciate
> the effort.
>
> Thanks,
>
> --sundar
>
> tmp <- read.csv(url("http://sdorairaj.googlepages.com/tmp.csv"))
> tmp[2:4] <- lapply(tmp[2:4], factor)
> library(lme4)
> fit <- lmer(1 - y ~ z + (1 | A) + (1 | A:B), tmp,
>              binomial("cloglog"), weights = as.numeric(w))
> ## see below for expand.bin
> tmp2 <- expand.bin(tmp, "x", "w")
> fit2 <- lmer(x ~ z + (1 | A) + (1 | A:B), tmp2, binomial("cloglog"))
>
> fit3 <- glm(1 - y ~ z, binomial("cloglog"), tmp, w)
> fit4 <- glm(x ~ z, binomial("cloglog"), tmp2)
>
> cbind(sapply(list(fit, fit2), fixef),
>        sapply(list(fit3, fit4), coef))
> #                  [,1]       [,2]       [,3]       [,4]
> #(Intercept) -2.8227711 -3.1247457 -2.8227711 -2.8227711
> #z2          -0.2680992 -0.2694113 -0.2680992 -0.2680992
> #z3           0.3109447  0.3233432  0.3109447  0.3109447
>
>
> expand.bin <- function (data, x, n) {
>    char.x <- x
>    char.n <- n
>    x <- data[[x]]
>    n <- data[[n]]
>    i <- rep(seq(nrow(data)), n)
>    data <- data[i, , drop = FALSE]
>    expand <- function(z) c(rep(0, diff(z)), rep(1, z[1]))
>    x <- apply(cbind(x, n), 1, expand)
>    data[[char.x]] <- if(is.matrix(x)) c(x) else unlist(x)
>    row.names(data) <- seq(nrow(data))
>    data
> }
>
>  > sessionInfo()
> R version 2.5.0 (2007-04-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
> [7] "base"
>
> other attached packages:
>         lme4      Matrix     lattice
> "0.99875-0" "0.99875-1"    "0.15-6"
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
-------------- next part --------------
expand.bin <- function(data, x, n)
{
    stopifnot(is.character(x), is.character(n),
              length(x) == 1, length(n) == 1,
              inherits(data, "data.frame"),
              all(c(x, n) %in% names(data)))
    nn <- as.integer(data[[n]])
    ans <- data[rep.int(seq_along(nn), nn), ]
    ans[[n]] <- NULL
    xx <- as.integer(data[[x]])
    ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
                        as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
    row.names(ans) <- seq(nrow(ans))
    ans
}
options(show.signif.stars = FALSE)
library(lme4)
example(cbpp)
m3 <- lmer(incidence ~ period + (1|herd),
           expand.bin(cbpp, "incidence", "size"), binomial)
m3
-------------- next part --------------

R version 2.5.0 Patched (2007-05-23 r41687)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> expand.bin <- function(data, x, n)
+ {
+     stopifnot(is.character(x), is.character(n),
+               length(x) == 1, length(n) == 1,
+               inherits(data, "data.frame"),
+               all(c(x, n) %in% names(data)))
+     nn <- as.integer(data[[n]])
+     ans <- data[rep.int(seq_along(nn), nn), ]
+     ans[[n]] <- NULL
+     xx <- as.integer(data[[x]])
+     ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
+                         as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
+     row.names(ans) <- seq(nrow(ans))
+     ans
+ }
> options(show.signif.stars = FALSE)
> library(lme4)
Loading required package: Matrix
Loading required package: lattice
> example(cbpp)

cbpp> ## response as a matrix
cbpp> (m1 <- lmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
cbpp+             family = binomial, data = cbpp))
Generalized linear mixed model fit using Laplace 
Formula: cbind(incidence, size - incidence) ~ period + (1 | herd) 
   Data: cbpp 
 Family: binomial(logit link)
   AIC   BIC logLik deviance
 110.1 120.2 -50.05    100.1
Random effects:
 Groups Name        Variance Std.Dev.
 herd   (Intercept) 0.41804  0.64656 
number of obs: 56, groups: herd, 15

Estimated scale (compare to  1 )  1.138075 

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.3981     0.2287  -6.113 9.76e-10
period2      -0.9959     0.3056  -3.259 0.001119
period3      -1.1350     0.3266  -3.475 0.000510
period4      -1.5798     0.4286  -3.686 0.000228

Correlation of Fixed Effects:
        (Intr) perid2 perid3
period2 -0.350              
period3 -0.327  0.267       
period4 -0.248  0.202  0.186

cbpp> ## response as a vector of probabilities and usage of argument "weights"
cbpp> m2 <- lmer(incidence / size ~ period + (1 | herd), weights = size,
cbpp+            family = binomial, data = cbpp)

cbpp> ## Confirm that these are equivalent:
cbpp> stopifnot(all.equal(coef(m1), coef(m2)),
cbpp+           all.equal(ranef(m1), ranef(m2)))
> m3 <- lmer(incidence ~ period + (1|herd),
+            expand.bin(cbpp, "incidence", "size"), binomial)
> m3
Generalized linear mixed model fit using Laplace 
Formula: incidence ~ period + (1 | herd) 
   Data: expand.bin(cbpp, "incidence", "size") 
 Family: binomial(logit link)
 AIC   BIC logLik deviance
 565 588.7 -277.5      555
Random effects:
 Groups Name        Variance Std.Dev.
 herd   (Intercept) 0.41448  0.6438  
number of obs: 842, groups: herd, 15

Estimated scale (compare to  1 )  0.9832878 

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.3984     0.2282  -6.129 8.86e-10
period2      -0.9934     0.3054  -3.253 0.001144
period3      -1.1332     0.3264  -3.471 0.000518
period4      -1.5805     0.4287  -3.686 0.000227

Correlation of Fixed Effects:
        (Intr) perid2 perid3
period2 -0.351              
period3 -0.328  0.267       
period4 -0.248  0.202  0.186
> 
> proc.time()
   user  system elapsed 
  7.208   0.120   7.322