[Rd] robust updating methods

Thierry Onkelinx thierry.onkelinx at inbo.be
Mon Mar 23 17:55:52 CET 2015


Dear Ben,

Last week I was struggling with incorporating lme4 into a package. I traced
the problem and made a reproducible example (
https://github.com/ThierryO/testlme4).  It looks very simular to the
problem you describe.

The 'tests' directory contains the reproducible examples. confint() of a
model as returned by a function fails. It even fails when I try to
calculate the confint() inside the same function as the glmer() call (see
the fit_model_ci function).

Best regards,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> WARNING: this is long.  Sorry I couldn't find a way to compress it.
>
> Is there a reasonable way to design an update method so that it's
> robust to a variety of reasonable use cases of generating calls or
> data inside or outside a function?  Is it even possible?  Should I
> just tell users "don't do that"?
>
> * `update.default()` uses `eval(call, parent.frame())`; this fails
>   when the call depends on objects that were defined in a different
>   environment (e.g., when the data are generated and the model
>   initially fitted within a function scope)
>
> * an alternative is to store the original environment in which the
>   fitting is done in the environment of the formula and use
>   `eval(call, env=environment(formula(object)))`; this fails if the
>   user tries to update the model originally fitted outside a function
>   with data modified within a function ...
>
> * I think I've got a hack that works below, which first tries in the
>   environment of the formula and falls back to the parent frame if
>   that fails, but I wonder if I'm missing something much simpler ..
>
>   Thoughts?  My understanding of environments and frames is still,
> after all these years, not what it should be ...
>
> I've thought of some other workarounds, none entirely satisfactory:
>
> * force evaluation of all elements in the original call
>     * printing components of the call can get ugly (can save the
> original call before evaluating)
>     * large objects in the call get duplicated
> * don't use `eval(call)` for updates; instead try to store everything
> internally
>     * this works OK but has the same drawback of potentially storing
> large extra copies
>     * we could try to use the model frame (which is stored already),
> but there are issues with this (the basis of a whole separate rant)
> because the model frame stores something in between predictor
> variables and input variables. For example
>
>    d <- data.frame(y=1:10,x=runif(10))
>    names(model.frame(lm(y~log(x),data=d)))
>    ## "y" "log(x)"
>
> So if we wanted to do something like update to "y ~ sqrt(x)",
> it wouldn't work ...
>
> ==================
> update.envformula <- function(object,...) {
>     extras <- match.call(expand.dots = FALSE)$...
>     call <- getCall(object)
>     for (i in names(extras)) {
>         existing <- !is.na(match(names(extras), names(call)))
>         for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
>         if (any(!existing)) {
>             call <- c(as.list(call), extras[!existing])
>             call <- as.call(call)
>         }
>     }
>     eval(call, env=environment(formula(object)))
>     ## enclos=parent.frame() doesn't help
> }
>
> update.both <- function(object,...) {
>     extras <- match.call(expand.dots = FALSE)$...
>     call <- getCall(object)
>     for (i in names(extras)) {
>         existing <- !is.na(match(names(extras), names(call)))
>         for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
>         if (any(!existing)) {
>             call <- c(as.list(call), extras[!existing])
>             call <- as.call(call)
>         }
>     }
>     pf <- parent.frame()  ## save parent frame in case we need it
>     tryCatch(eval(call, env=environment(formula(object))),
>              error=function(e) {
>                  eval(call, pf)
>              })
> }
>
> ### TEST CASES
>
> set.seed(101)
> d <- data.frame(x=1:10,y=rnorm(10))
> m1 <- lm(y~x,data=d)
>
> ##' define data within function, return fitted model
> f1 <- function() {
>     d2 <- d
>     lm(y~x,data=d2)
>     return(lm(y~x,data=d2))
> }
> ##' define (and modify) data within function, try to update
> ##' model fitted elsewhere
> f2 <- function(m) {
>     d2 <- d; d2[1] <- d2[1]+0 ## force copy
>     update.default(m,data=d2)
> }
> ##' define (and modify) data within function, try to update
> ##' model fitted elsewhere (use envformula)
> f3 <- function(m) {
>     d2 <- d; d2[1] <- d2[1]+0 ## force copy
>     update.envformula(m,data=d2)
> }
>
> ##' hack: first try the formula, then the parent frame
> ##' if that doesn't work for any reason
> f4 <- function(m) {
>     d2 <- d; d2[1] <- d2[1]+0 ## force copy
>     update.both(m,data=d2)
> }
>
> ## Case 1: fit within function
> m2 <- f1()
> try(update.default(m2))       ## default: object 'd2' not found
> m3A <- update.envformula(m2)  ## envformula: works
> m3B <- update.both(m2)        ## works
>
> ## Case 2: update within function
> m4A <- f2(m1)  ## default: works
> try(f3(m1))    ## envformula: object 'd2' not found
> m4B <- f4(m1)  ## works
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iQEcBAEBAgAGBQJVDvHBAAoJEOCV5YRblxUHtnwH/2452Fqmsm//LxNeXxhNrs/G
> 3ss8rPV2EVJQFjAyO34zzuYZVp1mWlgt7C1L7nXcH4lcf66gYfj75X/yVvMxxwmS
> uXEP9HKr4TR5D6Ukf16qqtK41PhTkjS+RDaEpj6BVq9YxlYb53kSjKF+anrf08SL
> K6i2L+c4nJIwk56CCp0Re/EIDCNeW5lXYX9jdPAalMoxSHTG8wmytvq0x89+KsRC
> aUTpdylPka70vwBQle9W4OEyhBpADIHfEg8csYXZ/MeOKM0bqCRu2IU3OyuCsxyX
> Pbo3w1Z7x0e+2WoX4PcltweYK4PJC9O10v+RKYaI5YRy1dFWMQWcPoAKRKf7jwY=
> =S7zP
> -----END PGP SIGNATURE-----
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list