[R] model.frame: how does one use it?
Deepayan Sarkar
deepayan.sarkar at gmail.com
Sat Jun 16 01:27:10 CEST 2007
On 6/15/07, Philipp Benner <pbenner at uos.de> wrote:
>
> Thanks for your explanation!
>
> > With this in mind, either of the following might do what you want:
> >
> > badFunction <- function(mydata, myformula) {
> > mydata$myweight <- abs(rnorm(nrow(mydata)))
> > hyp <-
> > rpart(myformula,
> > data=mydata,
> > weights=myweight,
> > method="class")
> > prev <- hyp
> > }
> >
> >
> > badFunction <- function(mydata, myformula) {
> > myweight <- abs(rnorm(nrow(mydata)))
> > environment(myformula) <- environment()
> > hyp <-
> > rpart(myformula,
> > data=mydata,
> > weights=myweight,
> > method="class")
> > prev <- hyp
> > }
>
> OK, this is what I have now:
>
> adaboostBad <- function(formula, data) {
> ## local definition of the weight vector (won't work because pima.formula is not defined within this function)
> w <- abs(rnorm(nrow(data)))
> rpart(formula, data=data, weights=w)
> }
>
> adaboostGood <- function(formula, data) {
> ## create weight vector in the data object
> data$w <- abs(rnorm(nrow(data)))
> rpart(formula, data=data, weights=w)
> }
>
> adaboostBest <- function(formula, data) {
> ## associate the current environment (this function's one) with the object `formula'
> environment(formula) <- environment()
> w <- abs(rnorm(nrow(data)))
> rpart(formula, data=data, weights=w)
> }
>
> As far as I understand this non-standard evaluation stuff,
> adaboostGood() and adaboostBest() are the only two possibilities to
> call rpart() with weight vectors. Now suppose that I don't know what
> `data' contains and suppose further that it already contains a
> column called `w'. adaboostGood() would overwrite that column with
> new data which is then used as weight vector and as training data
> for rpart(). adaboostBest() would just use the wrong data as weight
> vector as it finds data$w before the real weight vector. So, in both
> cases I have to check for `names(data) == "w"` and stop if TRUE? Or
> is there a better way?
Well, that depends on what you want to happen when there is a column
called 'w' in data. I don't see a situation where it makes sense to
use data$w as weights ('w' is just a name you happen to choose inside
adaboostBest), so I would just go with adaboostGood.
In case you are worried about overwriting the original data, that may
not be happening in the sense you are thinking. When you say
data$w <- abs(rnorm(nrow(data)))
inside adaboostGood, that modifies a local copy of the data argument,
not the original (R argument semantics are call by value, not call by
reference). You are losing data$w in the local copy in your function,
but why would you care if you are not using it anyway.
Of course, if your formula contains a reference to 'w' then you will
get wrong results, so checking for a unique name is always safer.
In addition, use an obfuscated name like '.__myWeights' instead
of 'w', and the check will be almost always irrelevant.
-Deepayan
More information about the R-help
mailing list