[R-sig-ME] Poor convergence when using binomial response
Sundar Dorai-Raj
sundar.dorai-raj at pdf.com
Wed May 23 21:17:15 CEST 2007
Thanks for the quick reply. Here's an interesting twist.
> # use cbind(failures, successes) which is appropriate
> # with the 'cloglog' link but does not converge
> fit <- lmer(cbind(x, w - x) ~ z + (1 | A) + (1 | A:B), tmp,
+ binomial("cloglog"), control = list(msVerbose = 1))
relative tolerance set to 5.56268464626788e-311
> # now use cbind(sucesses, failures) which is
> # inappropriate with the 'cloglog' link but does converge
> fit <- lmer(cbind(w - x, x) ~ z + (1 | A) + (1 | A:B), tmp,
+ binomial("cloglog"), control = list(msVerbose = 1))
relative tolerance set to 1.23153502616496e-05
0: 811.99477: 1.04814 0.0876112 -0.111220 0.888889 0.110250
...
36: 702.77184: 1.14326 0.0930921 -0.150606 0.0958980 0.0605783
>
So it seems the tolerance level is too small and no steps are taken in
nlminb. Using debug the problem seems to be in devLaplace.
Browse[1]> PQLpars
(Intercept) z2 z3
-2.8227711 -0.2680992 0.3109447 0.8888889 0.1102498
Browse[1]> devLaplace(PQLpars)
[1] 1.797693e+308
Browse[1]> abs(0.01/devLaplace(PQLpars))
[1] 5.562685e-311
Also, this isn't an urgent issue for me. I'm satisfied using the
expand.bin (thanks also for the modification) to do the analysis.
Thanks again,
--sundar
Douglas Bates said the following on 5/23/2007 9:49 AM:
> Thanks for sending the example, Sundar, and also for the idea of the
> expand.bin function.
>
> There is a similar example in the lme4 package using the cbpp data
> set. I modified your expand.bin function to apply to that example
> with the enclosed results. As you can see the parameter estimates in
> this example are not exactly the same for the "count out of" form
> versus the "expanded binary responses" form but they are reasonably
> close. In general I would recommend using cbind(successes, failures)
> as the response rather than messing around with the weights. I always
> manage to confuse myself about what the weights should be and need to
> go back to the cbind(successes, failures) form to check.
>
> As always, it is a good idea to set control = list(msVerbose = 1) to
> see exactly what is going on with the parameter estimates during the
> iterations.
>
> I hope this helps. If necessary I will look in more detail at your
> example but I hope this is enough to get you started.
>
>
> On 5/22/07, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
>> Hi,
>>
>> I'm seeing poor convergence of a model using the binomial family. In one
>> case, I have a binomial response for which I use the 'weights' argument.
>> In a second case, I expand the binomial response to Bernoulli trials and
>> refit. Theoretically, they should be the same model. However, that does
>> not appear to be the case because the residual scale for the binomial
>> case is unbelievably large. Could someone (Prof. Bates) please help
>> diagnose this problem?
>>
>> And thank you so much for such wonderful software. I truly appreciate
>> the effort.
>>
>> Thanks,
>>
>> --sundar
>>
>> tmp <- read.csv(url("http://sdorairaj.googlepages.com/tmp.csv"))
>> tmp[2:4] <- lapply(tmp[2:4], factor)
>> library(lme4)
>> fit <- lmer(1 - y ~ z + (1 | A) + (1 | A:B), tmp,
>> binomial("cloglog"), weights = as.numeric(w))
>> ## see below for expand.bin
>> tmp2 <- expand.bin(tmp, "x", "w")
>> fit2 <- lmer(x ~ z + (1 | A) + (1 | A:B), tmp2, binomial("cloglog"))
>>
>> fit3 <- glm(1 - y ~ z, binomial("cloglog"), tmp, w)
>> fit4 <- glm(x ~ z, binomial("cloglog"), tmp2)
>>
>> cbind(sapply(list(fit, fit2), fixef),
>> sapply(list(fit3, fit4), coef))
>> # [,1] [,2] [,3] [,4]
>> #(Intercept) -2.8227711 -3.1247457 -2.8227711 -2.8227711
>> #z2 -0.2680992 -0.2694113 -0.2680992 -0.2680992
>> #z3 0.3109447 0.3233432 0.3109447 0.3109447
>>
>>
>> expand.bin <- function (data, x, n) {
>> char.x <- x
>> char.n <- n
>> x <- data[[x]]
>> n <- data[[n]]
>> i <- rep(seq(nrow(data)), n)
>> data <- data[i, , drop = FALSE]
>> expand <- function(z) c(rep(0, diff(z)), rep(1, z[1]))
>> x <- apply(cbind(x, n), 1, expand)
>> data[[char.x]] <- if(is.matrix(x)) c(x) else unlist(x)
>> row.names(data) <- seq(nrow(data))
>> data
>> }
>>
>> > sessionInfo()
>> R version 2.5.0 (2007-04-23)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
>> [7] "base"
>>
>> other attached packages:
>> lme4 Matrix lattice
>> "0.99875-0" "0.99875-1" "0.15-6"
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> ------------------------------------------------------------------------
>
> expand.bin <- function(data, x, n)
> {
> stopifnot(is.character(x), is.character(n),
> length(x) == 1, length(n) == 1,
> inherits(data, "data.frame"),
> all(c(x, n) %in% names(data)))
> nn <- as.integer(data[[n]])
> ans <- data[rep.int(seq_along(nn), nn), ]
> ans[[n]] <- NULL
> xx <- as.integer(data[[x]])
> ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
> as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
> row.names(ans) <- seq(nrow(ans))
> ans
> }
> options(show.signif.stars = FALSE)
> library(lme4)
> example(cbpp)
> m3 <- lmer(incidence ~ period + (1|herd),
> expand.bin(cbpp, "incidence", "size"), binomial)
> m3
>
>
> ------------------------------------------------------------------------
>
>
> R version 2.5.0 Patched (2007-05-23 r41687)
> Copyright (C) 2007 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
> Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
> [Previously saved workspace restored]
>
>> expand.bin <- function(data, x, n)
> + {
> + stopifnot(is.character(x), is.character(n),
> + length(x) == 1, length(n) == 1,
> + inherits(data, "data.frame"),
> + all(c(x, n) %in% names(data)))
> + nn <- as.integer(data[[n]])
> + ans <- data[rep.int(seq_along(nn), nn), ]
> + ans[[n]] <- NULL
> + xx <- as.integer(data[[x]])
> + ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
> + as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
> + row.names(ans) <- seq(nrow(ans))
> + ans
> + }
>> options(show.signif.stars = FALSE)
>> library(lme4)
> Loading required package: Matrix
> Loading required package: lattice
>> example(cbpp)
>
> cbpp> ## response as a matrix
> cbpp> (m1 <- lmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
> cbpp+ family = binomial, data = cbpp))
> Generalized linear mixed model fit using Laplace
> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
> Data: cbpp
> Family: binomial(logit link)
> AIC BIC logLik deviance
> 110.1 120.2 -50.05 100.1
> Random effects:
> Groups Name Variance Std.Dev.
> herd (Intercept) 0.41804 0.64656
> number of obs: 56, groups: herd, 15
>
> Estimated scale (compare to 1 ) 1.138075
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.3981 0.2287 -6.113 9.76e-10
> period2 -0.9959 0.3056 -3.259 0.001119
> period3 -1.1350 0.3266 -3.475 0.000510
> period4 -1.5798 0.4286 -3.686 0.000228
>
> Correlation of Fixed Effects:
> (Intr) perid2 perid3
> period2 -0.350
> period3 -0.327 0.267
> period4 -0.248 0.202 0.186
>
> cbpp> ## response as a vector of probabilities and usage of argument "weights"
> cbpp> m2 <- lmer(incidence / size ~ period + (1 | herd), weights = size,
> cbpp+ family = binomial, data = cbpp)
>
> cbpp> ## Confirm that these are equivalent:
> cbpp> stopifnot(all.equal(coef(m1), coef(m2)),
> cbpp+ all.equal(ranef(m1), ranef(m2)))
>> m3 <- lmer(incidence ~ period + (1|herd),
> + expand.bin(cbpp, "incidence", "size"), binomial)
>> m3
> Generalized linear mixed model fit using Laplace
> Formula: incidence ~ period + (1 | herd)
> Data: expand.bin(cbpp, "incidence", "size")
> Family: binomial(logit link)
> AIC BIC logLik deviance
> 565 588.7 -277.5 555
> Random effects:
> Groups Name Variance Std.Dev.
> herd (Intercept) 0.41448 0.6438
> number of obs: 842, groups: herd, 15
>
> Estimated scale (compare to 1 ) 0.9832878
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.3984 0.2282 -6.129 8.86e-10
> period2 -0.9934 0.3054 -3.253 0.001144
> period3 -1.1332 0.3264 -3.471 0.000518
> period4 -1.5805 0.4287 -3.686 0.000227
>
> Correlation of Fixed Effects:
> (Intr) perid2 perid3
> period2 -0.351
> period3 -0.328 0.267
> period4 -0.248 0.202 0.186
>> proc.time()
> user system elapsed
> 7.208 0.120 7.322
More information about the R-sig-mixed-models
mailing list