[Rd] update.default: fall back on model.frame in case that the data frame is not in the parent environment
Duncan Murdoch
murdoch.duncan at gmail.com
Tue Aug 2 15:41:17 CEST 2011
It looks to me as though your proposal would allow update to remove
variables, but would give erroneous results when adding them. For example:
mm <- function(datf) {
lm(y ~ x, data = datf)
}
mydatf <- data.frame(x = rep(1:2, 10), y = rnorm(20, rep(1:2, 10)), z =
rnorm(20))
l <- mm(mydatf)
update(l, . ~ . + z) # This fails, z is not found
z <- rnorm(20)
update(l, . ~ . + z) # This finds the wrong z, without a warning
I'd rather get the "datf not found" error than wrong results.
Duncan Murdoch
On 02/08/2011 7:48 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote:
> Dear all,
>
> Suppose the following code:
>
> --------------8<--------------
> mm<- function(datf) {
> lm(y ~ x, data = datf)
> }
> mydatf<- data.frame(x = rep(1:2, 10), y = rnorm(20, rep(1:2, 10)))
>
> l<- mm(mydatf)
> -------------->8--------------
>
> If I want to update l now without providing the data argument an error
> occurs:
>
> --------------8<--------------
> > update(l, . ~ .)
> Error in inherits(x, "data.frame") : object 'datf' not found
> -------------->8--------------
>
> and I've to provide the data argument explicitly:
> --------------8<--------------
> update(l, . ~ ., data = mydatf)
> update(l, . ~ ., data = model.frame(l))
> -------------->8--------------
>
> While the first work-around is additionally error prone (what if I
> change the name of mydatf earlier in the file? In the best case I just
> get an error if mydatf is not defined), both options are kind of
> semantically questionable (I do not want to _update_ the data argument
> of the lm object it should remain untouched).
>
> So my suggestion would be that update falls back on the data stored in
> model.frame in case that the data argument in the lm call cannot be
> resolved in the parent.frame of update, which can be easily achieved by
> adding just four lines to update.default:
>
> --------------8<--------------
> update.default<- function (object, formula., ..., evaluate = TRUE) {
> call<- object$call
> if (is.null(call))
> stop("need an object with call component")
> extras<- match.call(expand.dots = FALSE)$...
> if (!missing(formula.))
> call$formula<- update.formula(formula(object), formula.)
> if (length(extras)) {
> existing<- !is.na(match(names(extras), names(call)))
> for (a in names(extras)[existing]) call[[a]]<- extras[[a]]
> if (any(!existing)) {
> call<- c(as.list(call), extras[!existing])
> call<- as.call(call)
> }
> }
> if (!is.null(call$data)) {
> if (!exists(as.character(call$data), envir = parent.frame()))
> call$data<- model.frame(object)
> }
> if (evaluate)
> eval(call, parent.frame())
> else call
> }
> -------------->8--------------
>
> This is just a quick dirty hack which works fine here (with an ugly
> drawback that in the standard output of lm I now see the lengthy
> explicit data.frame statement) but I'm sure there are some cracks out
> there who could take it over from here and beautify this idea.
>
> I don't see any problems with this proposition regarding old code, but
> if I'm wrong and there are some reasons not to touch update.default in
> the way I was proposing please let me know. Any other feedback is highly
> appreciated too.
>
> Thanks for sharing your thoughts with me.
>
> KR,
>
> -Thorn
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list