[Rd] robust updating methods
Ben Bolker
bbolker at gmail.com
Wed Mar 25 00:55:43 CET 2015
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 15-03-23 12:55 PM, Thierry Onkelinx wrote:
> Dear Ben,
>
> Last week I was struggling with incorporating lme4 into a package.
> I traced the problem and made a reproducible example (
> https://github.com/ThierryO/testlme4). It looks very simular to
> the problem you describe.
>
> The 'tests' directory contains the reproducible examples. confint()
> of a model as returned by a function fails. It even fails when I
> try to calculate the confint() inside the same function as the
> glmer() call (see the fit_model_ci function).
>
> Best regards,
>
> Thierry
Ugh. I can get this to work if I also try searching up the call
stack, as follows (within update.merMod). This feels like "code
smell" to me though -- i.e., if I have to hack this hard I must be
doing something wrong/misunderstanding how the problem *should* be done.
if (evaluate) {
ff <- environment(formula(object))
pf <- parent.frame() ## save parent frame in case we need it
sf <- sys.frames()[[1]]
tryCatch(eval(call, env=ff),
error=function(e) {
tryCatch(eval(call, env=sf),
error=function(e) {
eval(call, pf)
})
})
} else call
Here is an adapted even-more-minimal version of your code, which
seems to work with the version of update.merMod I just pushed to
github, but fails for glm():
## https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R
fit_model_ci <- function(formula, dataset, mfun=glmer){
model <- mfun(
formula = formula,
data = dataset,
family = "poisson"
)
ci <- confint(model)
return(list(model = model, confint = ci))
}
library("lme4")
set.seed(101)
dd <- data.frame(f=factor(rep(1:10,each=100)),
y=rpois(2,1000))
fit_model_ci(y~(1|f),dataset=dd)
fit_model_ci(y~(1|f),dataset=dd,mfun=glm)
>
>
> ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek /
> Research Institute for Nature and Forest team Biometrie &
> Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat
> 25 1070 Anderlecht Belgium
>
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may
> be able to say what the experiment died of. ~ Sir Ronald Aylmer
> Fisher The plural of anecdote is not data. ~ Roger Brinner The
> combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given
> body of data. ~ John Tukey
>
> 2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:
>
> WARNING: this is long. Sorry I couldn't find a way to compress
> it.
>
> Is there a reasonable way to design an update method so that it's
> robust to a variety of reasonable use cases of generating calls or
> data inside or outside a function? Is it even possible? Should I
> just tell users "don't do that"?
>
> * `update.default()` uses `eval(call, parent.frame())`; this fails
> when the call depends on objects that were defined in a different
> environment (e.g., when the data are generated and the model
> initially fitted within a function scope)
>
> * an alternative is to store the original environment in which the
> fitting is done in the environment of the formula and use
> `eval(call, env=environment(formula(object)))`; this fails if the
> user tries to update the model originally fitted outside a
> function with data modified within a function ...
>
> * I think I've got a hack that works below, which first tries in
> the environment of the formula and falls back to the parent frame
> if that fails, but I wonder if I'm missing something much simpler
> ..
>
> Thoughts? My understanding of environments and frames is still,
> after all these years, not what it should be ...
>
> I've thought of some other workarounds, none entirely
> satisfactory:
>
> * force evaluation of all elements in the original call * printing
> components of the call can get ugly (can save the original call
> before evaluating) * large objects in the call get duplicated *
> don't use `eval(call)` for updates; instead try to store
> everything internally * this works OK but has the same drawback of
> potentially storing large extra copies * we could try to use the
> model frame (which is stored already), but there are issues with
> this (the basis of a whole separate rant) because the model frame
> stores something in between predictor variables and input
> variables. For example
>
> d <- data.frame(y=1:10,x=runif(10))
> names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)"
>
> So if we wanted to do something like update to "y ~ sqrt(x)", it
> wouldn't work ...
>
> ================== update.envformula <- function(object,...) {
> extras <- match.call(expand.dots = FALSE)$... call <-
> getCall(object) for (i in names(extras)) { existing <-
> !is.na(match(names(extras), names(call))) for (a in
> names(extras)[existing]) call[[a]] <- extras[[a]] if
> (any(!existing)) { call <- c(as.list(call), extras[!existing]) call
> <- as.call(call) } } eval(call, env=environment(formula(object)))
> ## enclos=parent.frame() doesn't help }
>
> update.both <- function(object,...) { extras <-
> match.call(expand.dots = FALSE)$... call <- getCall(object) for (i
> in names(extras)) { existing <- !is.na(match(names(extras),
> names(call))) for (a in names(extras)[existing]) call[[a]] <-
> extras[[a]] if (any(!existing)) { call <- c(as.list(call),
> extras[!existing]) call <- as.call(call) } } pf <- parent.frame()
> ## save parent frame in case we need it tryCatch(eval(call,
> env=environment(formula(object))), error=function(e) { eval(call,
> pf) }) }
>
> ### TEST CASES
>
> set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <-
> lm(y~x,data=d)
>
> ##' define data within function, return fitted model f1 <-
> function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) } ##'
> define (and modify) data within function, try to update ##' model
> fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ##
> force copy update.default(m,data=d2) } ##' define (and modify) data
> within function, try to update ##' model fitted elsewhere (use
> envformula) f3 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force
> copy update.envformula(m,data=d2) }
>
> ##' hack: first try the formula, then the parent frame ##' if that
> doesn't work for any reason f4 <- function(m) { d2 <- d; d2[1] <-
> d2[1]+0 ## force copy update.both(m,data=d2) }
>
> ## Case 1: fit within function m2 <- f1() try(update.default(m2))
> ## default: object 'd2' not found m3A <- update.envformula(m2) ##
> envformula: works m3B <- update.both(m2) ## works
>
> ## Case 2: update within function m4A <- f2(m1) ## default: works
> try(f3(m1)) ## envformula: object 'd2' not found m4B <- f4(m1)
> ## works
>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAEBAgAGBQJVEfl/AAoJEOCV5YRblxUHWdgH/AqLAhDqKV8aRg6jnX9rO96D
nwzqv0ClMIxVr2dzD4eSQTL2caWZnXVkws+lg9N7bc4BaWplcYxLNRBw5M8zHOPJ
E7JlhG3EecvmeAEt9OY0/q6I0D6vdoEjcH7wzzuyLLIqllu9OskxURi/azMs0XRo
tiN+oG5aOKsMYsEGjtiWySRDzhJh2TM40A1HHjAViqpxZcqilAZ6RiNEFe1t1JY0
IvDI8yesSuHnKtgAiqk9ivGw4BCCGoBSIHB3GrJIi11j06iYKw0ugVHIlKYO8cqf
AYTvEX2sSxsJgKWYTiG/1dr/kiFTntTDji03zRLVUdPKIZATJMczv+KB+0bpoVY=
=Z34K
-----END PGP SIGNATURE-----
More information about the R-devel
mailing list