[R] building formula objects

Thomas Lumley tlumley at u.washington.edu
Tue Jul 9 17:35:51 CEST 2002


On 9 Jul 2002, Russell Senior wrote:

>
> I want to write a function to take an argument as the response
> variable of a linear model, e.g. to do anova's across a list of
> variables, something like the following (except, of course, this
> doesn't work):
>
>   function(x) { anova(lm(x ~ my.factor,data=my.data)) }
>
> The x in lm() above is getting evaluated at the wrong level.  How
> can I make this work?
>

When I tried this first I thought it did work. It provides a nice example
of why R does things this way (as well as why it's useful to give an
example in help questions)

Consider a data frame
	my.data<-data.frame(a=rep(0:1,12),b=rep(0:2,8),y=rnorm(24))
and
      f<-function(x)  { anova(lm(x ~ my.factor,data=my.data)) }

If we generate a new response vector (perhaps for simulations) and want to
regress it on a and b then the function above works nicely
> z<-rnorm(24)
> f(z)
Analysis of Variance Table

Response: x
          Df  Sum Sq Mean Sq F value Pr(>F)
a          1  0.0086  0.0086  0.0078 0.9305
b          1  0.1013  0.1013  0.0921 0.7645
Residuals 21 23.0914  1.0996

Here the response variable is the value of z. Presumably the reason it
`doesn't work' is that the question was different.  If we want to specify
a column in `my.data' as the response variable we need to pass in the name
of the variable and somehow get that name into the function.

This can be done with substitute(), as discussed in more detail in the R
Newsletter. This case is fairly simple and we can use

     g<-function(x) { ff<-eval(substitute(x~a+b))
 			anova(lm(ff,data=my.data)) }

> g(y)
Analysis of Variance Table

Response: y
          Df  Sum Sq Mean Sq F value Pr(>F)
a          1  2.1294  2.1294  2.3335 0.1415
b          1  0.1537  0.1537  0.1685 0.6856
Residuals 21 19.1634  0.9125

which is the correct answer as it matches

> anova(lm(y~a+b,data=my.data))
Analysis of Variance Table

Response: y
          Df  Sum Sq Mean Sq F value Pr(>F)
a          1  2.1294  2.1294  2.3335 0.1415
b          1  0.1537  0.1537  0.1685 0.6856
Residuals 21 19.1634  0.9125


	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list