[R-sig-ME] Poor convergence when using binomial response
Douglas Bates
bates at stat.wisc.edu
Wed May 23 18:49:05 CEST 2007
Thanks for sending the example, Sundar, and also for the idea of the
expand.bin function.
There is a similar example in the lme4 package using the cbpp data
set. I modified your expand.bin function to apply to that example
with the enclosed results. As you can see the parameter estimates in
this example are not exactly the same for the "count out of" form
versus the "expanded binary responses" form but they are reasonably
close. In general I would recommend using cbind(successes, failures)
as the response rather than messing around with the weights. I always
manage to confuse myself about what the weights should be and need to
go back to the cbind(successes, failures) form to check.
As always, it is a good idea to set control = list(msVerbose = 1) to
see exactly what is going on with the parameter estimates during the
iterations.
I hope this helps. If necessary I will look in more detail at your
example but I hope this is enough to get you started.
On 5/22/07, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
> Hi,
>
> I'm seeing poor convergence of a model using the binomial family. In one
> case, I have a binomial response for which I use the 'weights' argument.
> In a second case, I expand the binomial response to Bernoulli trials and
> refit. Theoretically, they should be the same model. However, that does
> not appear to be the case because the residual scale for the binomial
> case is unbelievably large. Could someone (Prof. Bates) please help
> diagnose this problem?
>
> And thank you so much for such wonderful software. I truly appreciate
> the effort.
>
> Thanks,
>
> --sundar
>
> tmp <- read.csv(url("http://sdorairaj.googlepages.com/tmp.csv"))
> tmp[2:4] <- lapply(tmp[2:4], factor)
> library(lme4)
> fit <- lmer(1 - y ~ z + (1 | A) + (1 | A:B), tmp,
> binomial("cloglog"), weights = as.numeric(w))
> ## see below for expand.bin
> tmp2 <- expand.bin(tmp, "x", "w")
> fit2 <- lmer(x ~ z + (1 | A) + (1 | A:B), tmp2, binomial("cloglog"))
>
> fit3 <- glm(1 - y ~ z, binomial("cloglog"), tmp, w)
> fit4 <- glm(x ~ z, binomial("cloglog"), tmp2)
>
> cbind(sapply(list(fit, fit2), fixef),
> sapply(list(fit3, fit4), coef))
> # [,1] [,2] [,3] [,4]
> #(Intercept) -2.8227711 -3.1247457 -2.8227711 -2.8227711
> #z2 -0.2680992 -0.2694113 -0.2680992 -0.2680992
> #z3 0.3109447 0.3233432 0.3109447 0.3109447
>
>
> expand.bin <- function (data, x, n) {
> char.x <- x
> char.n <- n
> x <- data[[x]]
> n <- data[[n]]
> i <- rep(seq(nrow(data)), n)
> data <- data[i, , drop = FALSE]
> expand <- function(z) c(rep(0, diff(z)), rep(1, z[1]))
> x <- apply(cbind(x, n), 1, expand)
> data[[char.x]] <- if(is.matrix(x)) c(x) else unlist(x)
> row.names(data) <- seq(nrow(data))
> data
> }
>
> > sessionInfo()
> R version 2.5.0 (2007-04-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
> [7] "base"
>
> other attached packages:
> lme4 Matrix lattice
> "0.99875-0" "0.99875-1" "0.15-6"
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
-------------- next part --------------
expand.bin <- function(data, x, n)
{
stopifnot(is.character(x), is.character(n),
length(x) == 1, length(n) == 1,
inherits(data, "data.frame"),
all(c(x, n) %in% names(data)))
nn <- as.integer(data[[n]])
ans <- data[rep.int(seq_along(nn), nn), ]
ans[[n]] <- NULL
xx <- as.integer(data[[x]])
ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
row.names(ans) <- seq(nrow(ans))
ans
}
options(show.signif.stars = FALSE)
library(lme4)
example(cbpp)
m3 <- lmer(incidence ~ period + (1|herd),
expand.bin(cbpp, "incidence", "size"), binomial)
m3
-------------- next part --------------
R version 2.5.0 Patched (2007-05-23 r41687)
Copyright (C) 2007 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> expand.bin <- function(data, x, n)
+ {
+ stopifnot(is.character(x), is.character(n),
+ length(x) == 1, length(n) == 1,
+ inherits(data, "data.frame"),
+ all(c(x, n) %in% names(data)))
+ nn <- as.integer(data[[n]])
+ ans <- data[rep.int(seq_along(nn), nn), ]
+ ans[[n]] <- NULL
+ xx <- as.integer(data[[x]])
+ ans[[x]] <- rep.int(rep.int(c(0,1), length(xx)),
+ as.vector(matrix(c(nn - xx, xx), nrow = 2, byrow = TRUE)))
+ row.names(ans) <- seq(nrow(ans))
+ ans
+ }
> options(show.signif.stars = FALSE)
> library(lme4)
Loading required package: Matrix
Loading required package: lattice
> example(cbpp)
cbpp> ## response as a matrix
cbpp> (m1 <- lmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
cbpp+ family = binomial, data = cbpp))
Generalized linear mixed model fit using Laplace
Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
Data: cbpp
Family: binomial(logit link)
AIC BIC logLik deviance
110.1 120.2 -50.05 100.1
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.41804 0.64656
number of obs: 56, groups: herd, 15
Estimated scale (compare to 1 ) 1.138075
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3981 0.2287 -6.113 9.76e-10
period2 -0.9959 0.3056 -3.259 0.001119
period3 -1.1350 0.3266 -3.475 0.000510
period4 -1.5798 0.4286 -3.686 0.000228
Correlation of Fixed Effects:
(Intr) perid2 perid3
period2 -0.350
period3 -0.327 0.267
period4 -0.248 0.202 0.186
cbpp> ## response as a vector of probabilities and usage of argument "weights"
cbpp> m2 <- lmer(incidence / size ~ period + (1 | herd), weights = size,
cbpp+ family = binomial, data = cbpp)
cbpp> ## Confirm that these are equivalent:
cbpp> stopifnot(all.equal(coef(m1), coef(m2)),
cbpp+ all.equal(ranef(m1), ranef(m2)))
> m3 <- lmer(incidence ~ period + (1|herd),
+ expand.bin(cbpp, "incidence", "size"), binomial)
> m3
Generalized linear mixed model fit using Laplace
Formula: incidence ~ period + (1 | herd)
Data: expand.bin(cbpp, "incidence", "size")
Family: binomial(logit link)
AIC BIC logLik deviance
565 588.7 -277.5 555
Random effects:
Groups Name Variance Std.Dev.
herd (Intercept) 0.41448 0.6438
number of obs: 842, groups: herd, 15
Estimated scale (compare to 1 ) 0.9832878
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3984 0.2282 -6.129 8.86e-10
period2 -0.9934 0.3054 -3.253 0.001144
period3 -1.1332 0.3264 -3.471 0.000518
period4 -1.5805 0.4287 -3.686 0.000227
Correlation of Fixed Effects:
(Intr) perid2 perid3
period2 -0.351
period3 -0.328 0.267
period4 -0.248 0.202 0.186
>
> proc.time()
user system elapsed
7.208 0.120 7.322
More information about the R-sig-mixed-models
mailing list