[R] invalid variable type in model.frame within a function
Thomas Lumley
tlumley at u.washington.edu
Thu Mar 23 18:17:00 CET 2006
On Thu, 23 Mar 2006, Ingmar Visser wrote:
> Dear expeRts,
>
> I came across the following error in using model.frame:
>
> # make a data.frame
> jet=data.frame(y=rnorm(10),x1=rnorm(10),x2=rnorm(10),rvar=rnorm(10))
> # spec of formula
> mf1=y~x1+x2
> # make the model.frame
> mf=model.frame(formula=mf1,data=jet,weights=rvar)
>
> Which gives the desired output:
<output snipped>
> However, doing this inside another function like this:
>
> makemodelframe <- function(formula,data,weights) {
> mf=model.frame(formula=formula,data=data,weights=weights)
> mf
> }
>
> produces the following error:
>
>> makemodelframe(mf1,jet,weights=rvar)
> Error in model.frame(formula, rownames, variables, varnames, extras,
> extranames, :
> invalid variable type
>
>
> Searching the R-help archives I came across bug-reports about this but
> couldn't figure out whehter the bug was solved or whether there are
> work-arounds available.
It's not a bug. There have been bug reports about related issues (and also
about this issue, but they tend to be marked "not a bug").
If you think about it, how could
makemodelframe(mf1,jet,weights=rvar)
possibly work?
R passes variables by value, so rvar has to be evaluated before the
function is called. But rvar is not the name of any global
variable (it's just a column in data frame), so how can R know where to
look?
The reason that people think it might work is by analogy with model.frame
and the regression commands, where
model.frame(y~x, data=d, weights=w)
does somehow retrieve d$w as the weight. This analogy tends to override
programming commonsense and make people believe that R will somehow know
where to find the weights.
Now, since model.frame() *does* manage to find the weights, it must be
possible, and it is. That doesn't make it a good idea, though. Regression
commands and model.frame() do some fairly advanced trickery to make it
work. This is documented on developer.r-project.org.
I don't think it's a good idea for people to write code like this. I
should admit (especially since it's Lent at the moment, and so is an
appropriate time to repent one's past errors) that I lobbied Ross and
Robert to make model.frame() work compatibly with S-PLUS in its treatment
of weights= arguments (when porting the survival package, nearly ten
years ago). They were reluctant at the time, and I now think they were
right, although this level of S-PLUS compatibility might have been
unavoidable.
I would advise writing your code so that you the call looks like
makemodelframe(mf1,jet,weights=~rvar)
That is, pass all the variables that are going to be evaluated in the
data= argument as formulas (or as quoted expressions). This is basically
what lme() does, where you supply two formulas and then various other bits
and pieces as objects. It is what my survey package does.
Then a user can do
makemodelframe(mf1,jet,weights=rvar)
if rvar is a variable in the current environment and
makemodelframe(mf1,jet,weights=~rvar)
if rvar is a variable in the data= argument, and both will work.
There is some discussion of this in a note on "Nonstandard evaluation" on
the developer.r-project.org webpage, including a function that will
produce a single model frame from multiple formulas.
Now, I think there are some exceptions to this recommendation, and I don't
have a very clear definition of them. I think of them as "macro-like"
functions that evaluate a supplied expression in some special context
Functions like this in base R include with() and capture.output(), and you
will find some more nice simple examples in the mitools package. For these
functions it really isn't ambiguous where the evaluation takes place. A
related issue is functions such as the plot() methods that use the
unevaluated forms of their arguments as labels. Again, the evaluation
of the labels isn't ambiguous, because it doesn't even happen.
With a few exceptions like these, though, I think its a bad idea
to subvert the pass-by-value illusion in R. This was a lot more than you
probably wanted to know, but the alternative answer is the traditional
"Doctor, it hurts when I do this"
"Don't do that, then"
-thomas
More information about the R-help
mailing list