[Rd] update.default: fall back on model.frame in case that the data frame is not in the parent environment
Duncan Murdoch
murdoch.duncan at gmail.com
Tue Aug 2 21:06:16 CEST 2011
On 02/08/2011 10:48 AM, Thaler,Thorn,LAUSANNE,Applied Mathematics wrote:
> > mm<- function(datf) {
> > lm(y ~ x, data = datf)
> > }
> > mydatf<- data.frame(x = rep(1:2, 10), y = rnorm(20, rep(1:2, 10)), z
> =
> > rnorm(20))
> >
> > l<- mm(mydatf)
> > update(l, . ~ . + z) # This fails, z is not found
>
> Good point. So let me rephrase the initial problem:
>
> 1.) An lm object is fitted somewhere with some data, which resides
> somewhere in the memory.
> 2.) An ideal update function would know where the original data is
> (rather than assuming that it is stored
> a.) in the parent frame
> b.) under the name given in the call slot of the lm object)
>
> While from my point of view assumption a.) seems to be reasonable,
> assumption b.) is kind of awkward as pointed out, because it makes it
> kind of cumbersome to update models, which were created inside a
> function (which should not be a too rare use case).
>
> Thus, I've to questions:
> 1.) Is it somehow possible to retrieve the original data.frame with
> which an lm is fitted just from the knowledge of the fit? I fear that
> model.frame is the best I have.
I don't think so. You can get the environment in which the formula was
created from the "terms" component of the result; that's the second
place lm() will look. The first place it will look is in the explicitly
specified data variable, and you can get its name, but I don't think the
result object necessarily stores the full "data" argument or the
environment in which to look it up. (In your example, you can look up
"datf" in environment(l$terms) and get it, but that wouldn't work if the
formula had also been specified as an argument to mm().)
> 2.) Is there any other way of making update aware of where to look for
> the model building data?
>
> By the way, another work-around I was just thinking of is to use
>
> mm<- function(datf) {
> l<- lm(y ~ x, data = datf)
> call<- l$call
> call$data<- substitute(datf)
> l$call<- call
> l
> }
>
> which solves my issue (and with which I can very well live with), but I
> was wondering whether you see any chance that update could be made
> smarter? Thanks for your input.
I would suggest something simpler: return a list containing both l and
datf, and pass datf to update. You can attach a class to that list to
hide some of the ugliness if you like.
Duncan Murdoch
More information about the R-devel
mailing list