[R-sig-ME] Fwd: updating functions mangling original copies of glmer fits?

Fri Jan 2 17:07:06 CET 2009

I thought others might be interested in this exchange that Ben and I
had over New Year's.

---------- Forwarded message ----------
From: Douglas Bates <bates at stat.wisc.edu>
Date: Thu, Jan 1, 2009 at 4:26 PM
Subject: Re: updating functions mangling original copies of glmer fits?
To: Ben Bolker <bolker at ufl.edu>

Thanks for the note, Ben.

On Sun, Dec 28, 2008 at 11:08 AM, Ben Bolker <bolker at ufl.edu> wrote:
>
>  Dear Prof Bates,
>
>  in trying to replicate the deviance-profiling code in the
> Implementation vignette, I've run across a slightly scary side
> effect.  The code below generates some data from the model
>
>  eps_b ~ Normal(0,1)
>  x ~  Uniform(0,1)
>  Linpred = 1+2*x+eps_b
>  Y ~ Poisson(exp(Linpred))
>
> that is, a fairly standard GLMM with a single continuous
> covariate, a single random effect on the intercept,
> log link, Poisson, etc. (I don't think any of this matters
> that much but I'm trying to be careful).
>
>  Then I bundled the code used to generate Figure 2 in
> Implementation.Rnw, with the addition of a .Call("mer_update_dev")
> to make it work for a GLMM, into a function called varprof
> that evaluates the model for a series of different random-effects
> standard deviations.
>
>  The answers make sense so far, but ... the **original model
> fit object** (which has been passed into the function, so ought to have
> had an independent copy made???) has now ended up being modified
> by the profiling code, so that its standard deviation is
> now equal to the maximum sd tried in the profile ...
>
>  I would try to sort this out but it feels like very deep R magic ...

What you have observed is indeed a violation of the functional
semantics of the S language, briefly summarized as "Thou shalt not
modify the value of an argument without copying".  The magic to do
this is not deep - inside a C function called through .Call you
can do whatever you want, including modifying the value of an
argument.

My excuse for doing things this way is the usual: efficiency.  The mer
object encapsulates the model and its current state during the
optimization process.  It would be possible to copy and update that
object every time the deviance is evaluated during the optimization
but
 a) that object can be huge
 b) even with all the tricks that are used for optimization, it still
requires many, many evaluations

Even one copy of such an object can be expensive.  There was a recent
message to the R-SIG-Mixed-Models mailing list regarding being able to
fit the model but running out of memory when trying to summarize it.
The problem was that there is a copy made in the summary method.

In the currently released version of lme4 I finessed the copying issue
by modifying the object in place during the optimization and keeping
the whole thing behind closed doors, as it were.  I suppose that when
I wrote code such as that in the Implementation vignette I should have
added a "Don't try this at home, kids" caveat but instead I modified
(yet again!) the overall approach.  If you check the code in the
"allcoef" branch of the SVN repository you will see that the model is
represented by an environment through the optimization phase and only
then converted to an S4 object.  Environments are special composite
objects in that they are never copied.  If an environment is passed to
a function and modified during the evaluation of the function then the
modifications are visible outside the function.  As such they are a
natural way of representing "state" during an optimization.

There are a couple of other (well, many other) enhancements in the
allcoef branch.  It uses an explicit call to nlminb for the
optimization with all the information encapsulated in the environment
and there is provision for incorporating additional parameters in the
optimization.  The name "allcoef" comes from a switch in the role of
the fixed-effects parameters in a generalized linear mixed model or a
nonlinear mixed model.  Now, all the coefficients in the linear
predictor are optimized during the IRLS iterations (the "inner"
optimization, which is well-controlled) and not during the general,
constrained optimization (the "outer" optimization performed by
nlminb).

Would it be okay for me to copy this reply to the R-SIG-Mixed-Models email list?

Happy New Year,
Doug Bates

>  I can obviously work around this by re-fitting the object after
> computing the profile to restore it, but it does feel like a bug ...
>
>  Haven't tested yet whether this problem is specific to mer_update_dev
> or whether it happens with LMMs as well ...
>
>  cheers
>    Ben Bolker
>
>
> set.seed(1001)
> x <- runif(1000)
> f <- factor(rep(1:10,each=100))
> blockeff <- rnorm(10,sd=1)
> linpred <- exp(1+2*x+blockeff[f])
> y <- rpois(1000,linpred)
> dat <- data.frame(x,f,y)
>
> plot(x,1+y,type="n",log="y")
> text(x,1+y,as.character(f))
> library(lme4)
> mod <- glmer(y~x+(1|f),data=dat,family=poisson)
>
> varprof <- function(mm,lower=0,upper=20,n=101) {
>  sg <- seq(lower, upper, len = n)
>  dev <- mm at deviance
>  nc <- length(dev)
>  nms <- names(dev)
>  vals <- matrix(0, nrow = length(sg), ncol = nc, dimnames = list(NULL,
> nms))
>  for (i in seq(along = sg)) {
>    .Call("mer_ST_setPars", mm, sg[i], PACKAGE = "lme4")
>    .Call("mer_update_L", mm, PACKAGE = "lme4")
>    res <- try(.Call("mer_update_RX", mm, PACKAGE = "lme4"), silent = TRUE)
>    if (inherits(res, "try-error")) {
>        vals[i,] <- NA
>      } else {
>        .Call("mer_update_ranef", mm, PACKAGE = "lme4")
>        .Call("mer_update_dev", mm, PACKAGE = "lme4") ## added for glmmML
>        vals[i,] <- mm at deviance
>      }
>  }
>  vals
> }
>
> attr(VarCorr(mod)$f,"stddev") ## 1.633
> logLik(mod) ## -535
> vv <- varprof(mod)
> attr(VarCorr(mod)$f,"stddev") ## 20
> logLik(mod)  ## -556!
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i486-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4    stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] emdbook_1.1.1.1    MASS_7.2-45        glmmADMB_0.3
> glmmML_0.81-3
> [5] lme4_0.999375-28   Matrix_0.999375-17 lattice_0.17-17
>
> loaded via a namespace (and not attached):
> [1] coda_0.13-3   grid_2.8.1    rjags_1.0.3-4
>
> --
> Ben Bolker
> Associate professor, Biology Dep't, Univ. of Florida
> bolker at ufl.edu / www.zoology.ufl.edu/bolker
> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
>