[R] building formula objects
Thomas Lumley
tlumley at u.washington.edu
Tue Jul 9 17:35:51 CEST 2002
On 9 Jul 2002, Russell Senior wrote:
>
> I want to write a function to take an argument as the response
> variable of a linear model, e.g. to do anova's across a list of
> variables, something like the following (except, of course, this
> doesn't work):
>
> function(x) { anova(lm(x ~ my.factor,data=my.data)) }
>
> The x in lm() above is getting evaluated at the wrong level. How
> can I make this work?
>
When I tried this first I thought it did work. It provides a nice example
of why R does things this way (as well as why it's useful to give an
example in help questions)
Consider a data frame
my.data<-data.frame(a=rep(0:1,12),b=rep(0:2,8),y=rnorm(24))
and
f<-function(x) { anova(lm(x ~ my.factor,data=my.data)) }
If we generate a new response vector (perhaps for simulations) and want to
regress it on a and b then the function above works nicely
> z<-rnorm(24)
> f(z)
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
a 1 0.0086 0.0086 0.0078 0.9305
b 1 0.1013 0.1013 0.0921 0.7645
Residuals 21 23.0914 1.0996
Here the response variable is the value of z. Presumably the reason it
`doesn't work' is that the question was different. If we want to specify
a column in `my.data' as the response variable we need to pass in the name
of the variable and somehow get that name into the function.
This can be done with substitute(), as discussed in more detail in the R
Newsletter. This case is fairly simple and we can use
g<-function(x) { ff<-eval(substitute(x~a+b))
anova(lm(ff,data=my.data)) }
> g(y)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
a 1 2.1294 2.1294 2.3335 0.1415
b 1 0.1537 0.1537 0.1685 0.6856
Residuals 21 19.1634 0.9125
which is the correct answer as it matches
> anova(lm(y~a+b,data=my.data))
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
a 1 2.1294 2.1294 2.3335 0.1415
b 1 0.1537 0.1537 0.1685 0.6856
Residuals 21 19.1634 0.9125
-thomas
Thomas Lumley Asst. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list