[R] model.frame: how does one use it?
Dirk Eddelbuettel
edd at debian.org
Fri Jun 15 22:34:48 CEST 2007
Hi Mark,
Thanks for the reply.
On 15 June 2007 at 14:33, Marc Schwartz wrote:
| On Fri, 2007-06-15 at 10:47 -0500, Dirk Eddelbuettel wrote:
| > Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart.
| > In short, the issue has to do with how rpart evaluates a formula and
| > supporting arguments, in particular 'weights'.
| >
| > A simple contrived example is
| >
| > -----------------------------------------------------------------------------
| > library(rpart)
| >
| > ## using data from help(rpart), set up simple example
| > myformula <- formula(Kyphosis ~ Age + Number + Start)
| > mydata <- kyphosis
| > myweight <- abs(rnorm(nrow(mydata)))
| >
| > goodFunction <- function(mydata, myformula, myweight) {
| > hyp <- rpart(myformula, data=mydata, weights=myweight, method="class")
| > prev <- hyp
| > }
| > goodFunction(mydata, myformula, myweight)
| > cat("Ok\n")
| >
| > ## now remove myweight and try to compute it inside a function
| > rm(myweight)
| >
| > badFunction <- function(mydata, myformula) {
| > myweight <- abs(rnorm(nrow(mydata)))
| > mf <- model.frame(myformula, mydata, myweight)
| > print(head(df))
| > hyp <- rpart(myformula,
| > data=mf,
| > weights=myweight,
| > method="class")
| > prev <- hyp
| > }
| > badFunction(mydata, myformula)
| > cat("Done\n")
| > -----------------------------------------------------------------------------
| >
| > Here goodFunction works, but only because myweight (with useless random
| > weights, but that is not the point here) is found from the calling
| > environment.
| >
| > badFunction fails after we remove myweight from there:
| >
| > :~> cat /tmp/philipp.R | R --slave
| > Ok
| > Error in eval(expr, envir, enclos) : object "myweight" not found
| > Execution halted
| > :~>
| >
| > As I was able to replicate it, I reported this to the package maintainer. It
| > turns out that seemingly all is well as this is supposed to work this way,
| > and I got a friendly pointer to study model.frame and its help page.
| >
| > Now I am stuck as I can't make sense of model.frame -- see badFunction
| > above. I would greatly appreciate any help in making rpart work with a local
| > argument weights so that I can tell Philipp that there is no bug. :)
| >
| > Regards, Dirk
|
|
| Dirk,
|
| As you note, the issue is the non-standard evaluation of the arguments
| in model.frame() The key section of the Details in ?model.frame is:
|
|
| All the variables in formula, subset and in ... are looked for first in
| data and then in the environment of formula (see the help for formula()
| for further details) and collected into a data frame. Then the subset
| expression is evaluated, and it is is used as a row index to the data
| frame. Then the na.action function is applied to the data frame (and may
| well add attributes). The levels of any factors in the data frame are
| adjusted according to the drop.unused.levels and xlev arguments.
|
|
| Note that even with your goodFunction(), if 'myweight' is created within
| the environment of the function and not in the global environment, it
| still fails:
|
| library(rpart)
| myformula <- formula(Kyphosis ~ Age + Number + Start)
| mydata <- kyphosis
|
| goodFunction <- function(mydata, myformula) {
| myweight <- abs(rnorm(nrow(mydata)))
| hyp <- rpart(myformula, data=mydata,
| weights=myweight, method="class")
| prev <- hyp
| }
|
|
| > goodFunction(mydata, myformula)
| Error in eval(expr, envir, enclos) : object "myweight" not found
|
|
| However, now let's do this:
|
|
| library(rpart)
| myformula <- formula(Kyphosis ~ Age + Number + Start)
| mydata <- kyphosis
| myweight <- abs(rnorm(nrow(mydata)))
|
| goodFunction <- function(mydata, myformula) {
| hyp <- rpart(myformula, data=mydata,
| weights=myweight, method="class")
| prev <- hyp
| }
|
| > goodFunction(mydata, myformula)
| >
|
| It works, because 'myweight' is found in the global environment, which
| is where the formula is created.
Well,yes, but doesn't this just recreate the working example I showed above?
It works 'because we get lucky' with the data in the outer global env.
| Now, final example, try this:
|
|
| library(rpart)
| goodFunction <- function() {
| myformula <- formula(Kyphosis ~ Age + Number +
| Start)
| mydata <- kyphosis
| myweight <- abs(rnorm(nrow(mydata)))
|
| hyp <- rpart(myformula, data=mydata,
| weights=myweight, method="class")
| prev <- hyp
| }
|
| > goodFunction()
| >
|
| It works because the formula is created within the environment of the
| function and hence, 'myweight', which is created there as well, is
| found.
That works because we force it to be local. BDR claims that my 'badFunction'
(derived from Philipp's original bug report) above can be made to work
provide you use model.frame. I asked about model.frame -- and you were kind
enough do answer, but you dodged the question.
So let me try again: How can rpart be called inside a function using a
local weight variable as I do above ? Either it can, and the BDR is right
and there is no bug, or one cannot, and then mere mortals like myself must
consider rpart to be buggy as it does not support all its argument in at
least some conceivable calling situations.
Is that a fair question?
Regards, Dirk
| There was a (non) bug filed on a related matter dealing with the
| evaluation of 'subset':
|
| http://bugs.r-project.org/cgi-bin/R/feature%26FAQ?id=3671
|
| and you might find this document on Non-Standard Evaluation helpful:
|
| http://developer.r-project.org/nonstandard-eval.pdf
|
| HTH,
|
| Marc
|
|
--
Hell, there are no rules here - we're trying to accomplish something.
-- Thomas A. Edison
More information about the R-help
mailing list