[Rd] update.default: fall back on model.frame in case that the data frame is not in the parent environment

Tue Aug 2 21:06:16 CEST 2011

On 02/08/2011 10:48 AM, Thaler,Thorn,LAUSANNE,Applied Mathematics wrote:
> >  mm<- function(datf) {
> >     lm(y ~ x, data = datf)
> >  }
> >  mydatf<- data.frame(x = rep(1:2, 10), y = rnorm(20, rep(1:2, 10)), z
> =
> >  rnorm(20))
> >
> >  l<- mm(mydatf)
> >  update(l, . ~ . + z)   # This fails, z is not found
>
> Good point. So let me rephrase the initial problem:
>
> 1.) An lm object is fitted somewhere with some data, which resides
> somewhere in the memory.
> 2.) An ideal update function would know where the original data is
> (rather than assuming that it is stored
>    a.) in the parent frame
>    b.) under the name given in the call slot of the lm object)
>
> While from my point of view assumption a.) seems to be reasonable,
> assumption b.) is kind of awkward as pointed out, because it makes it
> kind of cumbersome to update models, which were created inside a
> function (which should not be a too rare use case).
>
> Thus, I've to questions:
> 1.) Is it somehow possible to retrieve the original data.frame with
> which an lm is fitted just from the knowledge of the fit? I fear that
> model.frame is the best I have.

I don't think so.  You can get the environment in which the formula was 
created from the "terms" component of the result; that's the second 
place lm() will look.  The first place it will look is in the explicitly 
specified data variable, and you can get its name, but I don't think the 
result object necessarily stores the full "data" argument or the 
environment in which to look it up.  (In your example, you can look up 
"datf" in environment(l$terms) and get it, but that wouldn't work if the 
formula had also been specified as an argument to mm().)

> 2.) Is there any other way of making update aware of where to look for
> the model building data?
>
> By the way, another work-around I was just thinking of is to use
>
> mm<- function(datf) {
>     l<- lm(y ~ x, data = datf)
>     call<- l$call
>     call$data<- substitute(datf)
>     l$call<- call
>     l
> }
>
> which solves my issue (and with which I can very well live with), but I
> was wondering whether you see any chance that update could be made
> smarter? Thanks for your input.

I would suggest something simpler:  return a list containing both l and 
datf, and pass datf to update.  You can attach a class to that list to 
hide some of the ugliness if you like.

Duncan Murdoch