[Rd] help with eval()

Tue Apr 19 09:37:33 CEST 2011

On Apr 19, 2011, at 07:16 , Prof Brian Ripley wrote:

> On Mon, 18 Apr 2011, Duncan Murdoch wrote:
> 
>> On 11-04-18 5:51 PM, Terry Therneau wrote:
>>> I've narrowed my scope problems with predict.coxph further.
>>> Here is a condensed example:
>>> fcall3<- as.formula("time ~ age")
>>> dfun3<- function(dcall) {
>>>     fit<- lm(dcall, data=lung, model=FALSE)
>>>     model.frame(fit)
>>> }
>>> dfun3(fcall3)
>>> [.....]
>>>   I don't understand the logic behind looking for variables in the place
>>> the formula was first typed (this is not a complaint).  The inability to
>>> look elsewhere however has stymied my efforts to fix the scoping problem
>>> in predict.coxph, unless I drop the env(formula) argument alltogether.
>>> But I assume there must be good reasons for it's inclusion and am
>>> reluctant to do so.
>> 
>> 
>> The reason is that when a formula is created, the variables in it are assumed to have meaning in that context.  Where you work with the formula after that should not be relevant:  that's why formulas carry environments with them. When you create the formula before the variables, things go wrong.
>> 
>> There's probably a way to associate the lung dataframe with the formula, or create the formula in such a way that things work, but I can't spot it.
> 
> This is why model=FALSE is not the default.  It avoids trying to find the data at a later date (and even if you can solve the scoping issues, the data may have been changed).

Yes, but there are other cases where a reevaluation is triggered. The example I found earlier involved doing model.frame on a subset, in which case the length(nargs) clause in model.frame.lm gets chosen.

So something is not right: Either we should arrange that reevaluations are never necessary, or we there should be a mechanism to get them reevaluated in the same scope as the original call. 

An obvious way would be to add the evaluation environment as an attribute to the $call component, but what would the memory management and serialization consequences be?

One workaround is, as Gabor points out, effectively to substitute the value of the arguments to lm() at the point of the call, using do.call(lm, list(.....)) or some eval(substitute(.....)) construct to the same effect. However, the result of do.call() will look awkward in the cases where the $call gets deparsed, though. E.g. in Gabor's example, if we modify it to show the actual fit, we get the result below (I'm sure you can imagine what would happen if a data frame with more than 7 rows got used!). On the other hand, NOT substituting such arguments leaves the scoping issues.

Another possible workaround is to make sure that functions that call modelling code internally will do the evaluation in the frame of the caller (like the call to model.matrix inside lm does). However, that seems to defeat the purpose of adding environments to formulas in the first place.  

-pd

> dfun3 <- function(dcall) {
+      fit <- do.call("lm", list(dcall, data = BOD, model = FALSE))
+ print(model.frame(fit))
+ fit}
> dfun3(fcall3)
  demand Time
1    8.3    1
2   10.3    2
3   19.0    3
4   16.0    4
5   15.6    5
6   19.8    7

Call:
lm(formula = demand ~ Time, data = structure(list(Time = c(1, 
2, 3, 4, 5, 7), demand = c(8.3, 10.3, 19, 16, 15.6, 19.8)), .Names = c("Time", 
"demand"), row.names = c(NA, -6L), class = "data.frame", reference = "A1.4, p. 270"), 
    model = FALSE)

Coefficients:
(Intercept)         Time  
      8.521        1.721  

> 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com