[R] invalid variable type in model.frame within a function

Thomas Lumley tlumley at u.washington.edu
Thu Mar 23 18:17:00 CET 2006


On Thu, 23 Mar 2006, Ingmar Visser wrote:

> Dear expeRts,
>
> I came across the following error in using model.frame:
>
> # make a data.frame
> jet=data.frame(y=rnorm(10),x1=rnorm(10),x2=rnorm(10),rvar=rnorm(10))
> # spec of formula
> mf1=y~x1+x2
> # make the model.frame
> mf=model.frame(formula=mf1,data=jet,weights=rvar)
>
> Which gives the desired output:
<output snipped>
> However, doing this inside another function like this:
>
> makemodelframe <- function(formula,data,weights) {
>    mf=model.frame(formula=formula,data=data,weights=weights)
>    mf
> }
>
> produces the following error:
>
>> makemodelframe(mf1,jet,weights=rvar)
> Error in model.frame(formula, rownames, variables, varnames, extras,
> extranames,  :
>    invalid variable type
>
>
> Searching the R-help archives I came across bug-reports about this but
> couldn't figure out whehter the bug was solved or whether there are
> work-arounds available.

It's not a bug. There have been bug reports about related issues (and also 
about this issue, but they tend to be marked "not a bug").

If you think about it, how could
    makemodelframe(mf1,jet,weights=rvar)

possibly work?

R passes variables by value, so rvar has to be evaluated before the 
function is called. But rvar is not the name of any global 
variable (it's just a column in data frame), so how can R know where to 
look?

The reason that people think it might work is by analogy with model.frame 
and the regression commands, where
   model.frame(y~x, data=d, weights=w)
does somehow retrieve d$w as the weight.  This analogy tends to override 
programming commonsense and make people believe that R will somehow know 
where to find the weights.

Now, since model.frame() *does* manage to find the weights, it must be 
possible, and it is.  That doesn't make it a good idea, though. Regression 
commands and model.frame() do some fairly advanced trickery to make it 
work. This is documented on developer.r-project.org.

I don't think it's a good idea for people to write code like this. I 
should admit (especially since it's Lent at the moment, and so is an 
appropriate time to repent one's past errors) that I lobbied Ross and 
Robert to make model.frame() work compatibly with S-PLUS in its treatment 
of weights= arguments (when porting the survival package, nearly ten 
years ago).  They were reluctant at the time, and I now think they were 
right, although this level of S-PLUS compatibility might have been 
unavoidable.

I would advise writing your code so that you the call looks like
   makemodelframe(mf1,jet,weights=~rvar)
That is, pass all the variables that are going to be evaluated in the 
data= argument as formulas (or as quoted expressions).  This is basically 
what lme() does, where you supply two formulas and then various other bits 
and pieces as objects. It is what my survey package does.

Then a user can do
   makemodelframe(mf1,jet,weights=rvar)
if rvar is a variable in the current environment and
   makemodelframe(mf1,jet,weights=~rvar)
if rvar is a variable in the data= argument, and both will work.

There is some discussion of this in a note on "Nonstandard evaluation" on 
the developer.r-project.org webpage, including a function that will 
produce a single model frame from multiple formulas.


Now, I think there are some exceptions to this recommendation, and I don't 
have a very clear definition of them. I think of them as "macro-like" 
functions that evaluate a supplied expression in some special context
Functions like this in base R include with() and capture.output(), and you 
will find some more nice simple examples in the mitools package. For these 
functions it really isn't ambiguous where the evaluation takes place.  A 
related issue is functions such as the plot() methods that use the 
unevaluated forms of their arguments as labels. Again, the evaluation 
of the labels isn't ambiguous, because it doesn't even happen.

With a few exceptions like these, though, I think its a bad idea 
to subvert the pass-by-value illusion in R. This was a lot more than you 
probably wanted to know, but the alternative answer is the traditional

"Doctor, it hurts when I do this"
    "Don't do that, then"


 	-thomas




More information about the R-help mailing list