[R] model.frame: how does one use it?

Marc Schwartz marc_schwartz at comcast.net
Sat Jun 16 01:21:33 CEST 2007


On Fri, 2007-06-15 at 15:34 -0500, Dirk Eddelbuettel wrote:
> Hi Mark,
> 
> Thanks for the reply.
> 
> On 15 June 2007 at 14:33, Marc Schwartz wrote:
> | On Fri, 2007-06-15 at 10:47 -0500, Dirk Eddelbuettel wrote: 
> | > Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart.
> | > In short, the issue has to do with how rpart evaluates a formula and
> | > supporting arguments, in particular 'weights'.  
> | > 
> | > A simple contrived example is
> | > 
> | > -----------------------------------------------------------------------------
> | > library(rpart)
> | > 
> | > ## using data from help(rpart), set up simple example
> | > myformula <- formula(Kyphosis ~ Age + Number + Start)
> | > mydata <- kyphosis
> | > myweight <- abs(rnorm(nrow(mydata)))
> | > 
> | > goodFunction <- function(mydata, myformula, myweight) {
> | >   hyp <- rpart(myformula, data=mydata, weights=myweight, method="class")
> | >   prev <- hyp
> | > }
> | > goodFunction(mydata, myformula, myweight)
> | > cat("Ok\n")
> | > 

<snip>

> | 
> | However, now let's do this:
> | 
> | 
> | library(rpart)
> | myformula <- formula(Kyphosis ~ Age + Number + Start)
> | mydata <- kyphosis
> | myweight <- abs(rnorm(nrow(mydata)))
> | 
> | goodFunction <- function(mydata, myformula) {
> |                          hyp <- rpart(myformula, data=mydata,
> |                                       weights=myweight, method="class")
> |                          prev <- hyp
> |                         }
> | 
> | > goodFunction(mydata, myformula)
> | > 
> | 
> | It works, because 'myweight' is found in the global environment, which
> | is where the formula is created.
> 
> Well,yes, but doesn't this just recreate the working example I showed above?
> It works 'because we get lucky' with the data in the outer global env.

Technically, it is not the same, as I was trying to emphasize that there
was no need to pass 'myweight' as an argument to the function to
facilitate successful location/evaluation within the function.

We don't get lucky here. The behavior is by design and consistent with
the documentation, which is that 'myweight' in the call to rpart() is
evaluated within the environment of the formula in this case. The
formula is created in the global environment, so 'myweight' is found
there. Hence, no need to pass it as an argument.

A review of the code for rpart() will reveal code similar to that which
is used in most R modeling functions, relative to the evaluation of the
formula, associated args and the creation of the model frame.

One exception to the above, is that in other modeling functions, one
could forgo passing the formula and just pass the entire data frame,
where the presumption is that the first column is the response variable
and the remaining columns would be the independent terms. I don't see
that supported in rpart().

> 
> | Now, final example, try this:
> | 
> | 
> | library(rpart)
> | goodFunction <- function() {
> |                          myformula <- formula(Kyphosis ~ Age + Number +
> |                                               Start)
> |                          mydata <- kyphosis
> |                          myweight <- abs(rnorm(nrow(mydata)))
> | 
> |                          hyp <- rpart(myformula, data=mydata,
> |                                       weights=myweight, method="class")
> |                          prev <- hyp
> |                         }
> | 
> | > goodFunction()
> | > 
> | 
> | It works because the formula is created within the environment of the
> | function and hence, 'myweight', which is created there as well, is
> | found.
> 
> That works because we force it to be local. BDR claims that my 'badFunction'
> (derived from Philipp's original bug report) above can be made to work
> provide you use model.frame.  I asked about model.frame -- and you were kind
> enough do answer, but you dodged the question.
> 
> So let me try again:  How can rpart be called inside a function using a
> local weight variable as I do above ?   Either it can, and the BDR is right
> and there is no bug, or one cannot, and then mere mortals like myself must
> consider rpart to be buggy as it does not support all its argument in at
> least some conceivable calling situations. 
> 
> Is that a fair question?
> 
> Regards,  Dirk

Yep, entirely fair. 

Without knowing what specific approach Prof. Ripley had in mind, I am
envisioning a couple of possibilities, but here is one:

library(rpart)

myformula <- formula(Kyphosis ~ Age + Number + Start)
mydata <- kyphosis

badFunction <- function(mydata, myformula) {
  mydata$myweight <- abs(rnorm(nrow(mydata)))
  rpart(myformula, data = mydata, weights = myweight, method = "class")
}

badFunction(mydata, myformula)


Basically, there are 3 places in which 'myweights' could be found:

1. Formula environment

2. Data frame environment

3. Global environment


In this case, we add the weights as a new column within the function to
the 'mydata' data frame, so that it will be found in the call to
rpart(), based upon location number 2 above.

Does that help?

Regards,

Marc



More information about the R-help mailing list