[R] pulling items out of a lm() call
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Mon May 1 13:37:21 CEST 2006
Andrew Gelman <gelman at stat.columbia.edu> writes:
> I want to write a function to standardize regression predictors, which
> will require me to do some character-string manipulation to parse the
> variables in a call to lm() or glm().
>
> For example, consider the call
> lm (y ~ female + I(age^2) + female:black + (age + education)*female).
>
> I want to be able to parse this to pick out the input variables
> ("female", "age", "black", "education"). Then I can transform these as
> appropriate (to get "z.female", "z.age", etc), feed them back into the
> lm() function, and go from there.
>
> Does anyone know an easy way to pull out the variables? I basically
> have to parse out the symbols "+", ":", "*", and " ", but there's also
> the problem of handling parentheses and the I() operator.
At which level of generality do you want this?
Consider
> attr(terms(y ~ female + I(age^2) + female:black + (age +
+ education)*female),"variables")
list(y, female, I(age^2), black, age, education)
> attr(delete.response(terms(y ~ female + I(age^2) + female:black +
+ (age + education)*female)),"variables")
list(female, I(age^2), black, age, education)
This gets you some of the way. However, there are complications: You
can't just remove composite terms like "I(age^2)" because it is not
guaranteed that "age" is in among the other terms:
> attr(terms( ~ I(speed^2)),"variables")
list(I(speed^2))
So you need some way to tease out the individual variables inside I().
Here's a first cut.
l <- attr(delete.response(terms(y ~ female + I(age^2) + female:black
+ (age + education)*female)),"variables")
getterms <- function(e) {
if (is.name(e)) e
else if (is.call(e)) lapply(e[-1], getterms)}
unique(c(lapply(l[-1],getterms), recursive=TRUE))
and possibly throw in an as.character() to get a vector of strings,
rather than a list of symbols. Notice that since anything can go
inside I(), you can get in trouble if parts of the expression is not
intended as a variable (e.g., y^lambda where lambda is a scalar). The
getterms function above pragmatically assumes that at least function
names need to be discarded.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list