[R-sig-ME] Poor convergence when using binomial response

Wed May 23 21:17:15 CEST 2007

Thanks for the quick reply. Here's an interesting twist.

 > # use cbind(failures, successes) which is appropriate
 > # with the 'cloglog' link but does not converge
 > fit <- lmer(cbind(x, w - x) ~ z + (1 | A) + (1 | A:B), tmp,
+             binomial("cloglog"), control = list(msVerbose = 1))
relative tolerance set to 5.56268464626788e-311
 > # now use cbind(sucesses, failures) which is
 > # inappropriate with the 'cloglog' link but does converge
 > fit <- lmer(cbind(w - x, x) ~ z + (1 | A) + (1 | A:B), tmp,
+             binomial("cloglog"), control = list(msVerbose = 1))
relative tolerance set to 1.23153502616496e-05
   0:     811.99477:  1.04814 0.0876112 -0.111220 0.888889 0.110250
...
  36:     702.77184:  1.14326 0.0930921 -0.150606 0.0958980 0.0605783
 >

So it seems the tolerance level is too small and no steps are taken in 
nlminb. Using debug the problem seems to be in devLaplace.

Browse[1]> PQLpars
(Intercept)          z2          z3
  -2.8227711  -0.2680992   0.3109447   0.8888889   0.1102498
Browse[1]> devLaplace(PQLpars)
[1] 1.797693e+308
Browse[1]> abs(0.01/devLaplace(PQLpars))
[1] 5.562685e-311

Also, this isn't an urgent issue for me. I'm satisfied using the 
expand.bin (thanks also for the modification) to do the analysis.

Thanks again,

--sundar

Douglas Bates said the following on 5/23/2007 9:49 AM:
> Thanks for sending the example, Sundar, and also for the idea of the
> expand.bin function.
> 
> There is a similar example in the lme4 package using the cbpp data
> set.  I modified your expand.bin function to apply to that example
> with the enclosed results.  As you can see the parameter estimates in
> this example are not exactly the same for the "count out of" form
> versus the "expanded binary responses" form but they are reasonably
> close.  In general I would recommend using cbind(successes, failures)
> as the response rather than messing around with the weights.  I always
> manage to confuse myself about what the weights should be and need to
> go back to the cbind(successes, failures) form to check.
> 
> As always, it is a good idea to set control = list(msVerbose = 1) to
> see exactly what is going on with the parameter estimates during the
> iterations.
> 
> I hope this helps.  If necessary I will look in more detail at your
> example but I hope this is enough to get you started.
> 
> 
> On 5/22/07, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
>> Hi,
>>
>> I'm seeing poor convergence of a model using the binomial family. In one
>> case, I have a binomial response for which I use the 'weights' argument.
>> In a second case, I expand the binomial response to Bernoulli trials and
>> refit. Theoretically, they should be the same model. However, that does
>> not appear to be the case because the residual scale for the binomial
>> case is unbelievably large. Could someone (Prof. Bates) please help
>> diagnose this problem?
>>
>> And thank you so much for such wonderful software. I truly appreciate
>> the effort.
>>
>> Thanks,
>>
>> --sundar
>>
>> tmp <- read.csv(url("http://sdorairaj.googlepages.com/tmp.csv"))
>> tmp[2:4] <- lapply(tmp[2:4], factor)
>> library(lme4)
>> fit <- lmer(1 - y ~ z + (1 | A) + (1 | A:B), tmp,
>>              binomial("cloglog"), weights = as.numeric(w))
>> ## see below for expand.bin
>> tmp2 <- expand.bin(tmp, "x", "w")
>> fit2 <- lmer(x ~ z + (1 | A) + (1 | A:B), tmp2, binomial("cloglog"))
>>
>> fit3 <- glm(1 - y ~ z, binomial("cloglog"), tmp, w)
>> fit4 <- glm(x ~ z, binomial("cloglog"), tmp2)
>>
>> cbind(sapply(list(fit, fit2), fixef),
>>        sapply(list(fit3, fit4), coef))
>> #                  [,1]       [,2]       [,3]       [,4]
>> #(Intercept) -2.8227711 -3.1247457 -2.8227711 -2.8227711
>> #z2          -0.2680992 -0.2694113 -0.2680992 -0.2680992
>> #z3           0.3109447  0.3233432  0.3109447  0.3109447
>>
>>
>> expand.bin <- function (data, x, n) {
>>    char.x <- x
>>    char.n <- n
>>    x <- data[[x]]
>>    n <- data[[n]]
>>    i <- rep(seq(nrow(data)), n)
>>    data <- data[i, , drop = FALSE]
>>    expand <- function(z) c(rep(0, diff(z)), rep(1, z[1]))
>>    x <- apply(cbind(x, n), 1, expand)
>>    data[[char.x]] <- if(is.matrix(x)) c(x) else unlist(x)
>>    row.names(data) <- seq(nrow(data))
>>    data
>> }
>>
>>  > sessionInfo()
>> R version 2.5.0 (2007-04-23)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
>> [7] "base"
>>
>> other attached packages:
>>         lme4      Matrix     lattice
>> "0.99875-0" "0.99875-1"    "0.15-6"
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
> ------------------------------------------------------------------------
> 
> expand.bin <- function(data, x, n)
> {
>     stopifnot(is.character(x), is.character(n),
>               length(x) == 1, length(n) == 1,
>               inherits(data, "data.frame"),
>               all(c(x, n) %in% names(data)))
>     nn <- as.integer(data[[n]])
>     ans <- data[rep.int(seq_along(nn), nn), ]
>     ans[[n]] <- NULL
>     xx <- as.integer(data[[x]])
>     ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
>                         as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
>     row.names(ans) <- seq(nrow(ans))
>     ans
> }
> options(show.signif.stars = FALSE)
> library(lme4)
> example(cbpp)
> m3 <- lmer(incidence ~ period + (1|herd),
>            expand.bin(cbpp, "incidence", "size"), binomial)
> m3
> 
> 
> ------------------------------------------------------------------------
> 
> 
> R version 2.5.0 Patched (2007-05-23 r41687)
> Copyright (C) 2007 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> 
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
> 
>   Natural language support but running in an English locale
> 
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
> 
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
> 
> [Previously saved workspace restored]
> 
>> expand.bin <- function(data, x, n)
> + {
> +     stopifnot(is.character(x), is.character(n),
> +               length(x) == 1, length(n) == 1,
> +               inherits(data, "data.frame"),
> +               all(c(x, n) %in% names(data)))
> +     nn <- as.integer(data[[n]])
> +     ans <- data[rep.int(seq_along(nn), nn), ]
> +     ans[[n]] <- NULL
> +     xx <- as.integer(data[[x]])
> +     ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
> +                         as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
> +     row.names(ans) <- seq(nrow(ans))
> +     ans
> + }
>> options(show.signif.stars = FALSE)
>> library(lme4)
> Loading required package: Matrix
> Loading required package: lattice
>> example(cbpp)
> 
> cbpp> ## response as a matrix
> cbpp> (m1 <- lmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
> cbpp+             family = binomial, data = cbpp))
> Generalized linear mixed model fit using Laplace 
> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd) 
>    Data: cbpp 
>  Family: binomial(logit link)
>    AIC   BIC logLik deviance
>  110.1 120.2 -50.05    100.1
> Random effects:
>  Groups Name        Variance Std.Dev.
>  herd   (Intercept) 0.41804  0.64656 
> number of obs: 56, groups: herd, 15
> 
> Estimated scale (compare to  1 )  1.138075 
> 
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -1.3981     0.2287  -6.113 9.76e-10
> period2      -0.9959     0.3056  -3.259 0.001119
> period3      -1.1350     0.3266  -3.475 0.000510
> period4      -1.5798     0.4286  -3.686 0.000228
> 
> Correlation of Fixed Effects:
>         (Intr) perid2 perid3
> period2 -0.350              
> period3 -0.327  0.267       
> period4 -0.248  0.202  0.186
> 
> cbpp> ## response as a vector of probabilities and usage of argument "weights"
> cbpp> m2 <- lmer(incidence / size ~ period + (1 | herd), weights = size,
> cbpp+            family = binomial, data = cbpp)
> 
> cbpp> ## Confirm that these are equivalent:
> cbpp> stopifnot(all.equal(coef(m1), coef(m2)),
> cbpp+           all.equal(ranef(m1), ranef(m2)))
>> m3 <- lmer(incidence ~ period + (1|herd),
> +            expand.bin(cbpp, "incidence", "size"), binomial)
>> m3
> Generalized linear mixed model fit using Laplace 
> Formula: incidence ~ period + (1 | herd) 
>    Data: expand.bin(cbpp, "incidence", "size") 
>  Family: binomial(logit link)
>  AIC   BIC logLik deviance
>  565 588.7 -277.5      555
> Random effects:
>  Groups Name        Variance Std.Dev.
>  herd   (Intercept) 0.41448  0.6438  
> number of obs: 842, groups: herd, 15
> 
> Estimated scale (compare to  1 )  0.9832878 
> 
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -1.3984     0.2282  -6.129 8.86e-10
> period2      -0.9934     0.3054  -3.253 0.001144
> period3      -1.1332     0.3264  -3.471 0.000518
> period4      -1.5805     0.4287  -3.686 0.000227
> 
> Correlation of Fixed Effects:
>         (Intr) perid2 perid3
> period2 -0.351              
> period3 -0.328  0.267       
> period4 -0.248  0.202  0.186
>> proc.time()
>    user  system elapsed 
>   7.208   0.120   7.322