[R] trouble automating formula edits when log or * are present; update trouble

Bert Gunter gunter.berton at gene.com
Tue May 29 19:01:23 CEST 2012


Michael:

m2 is a model fit, not a formula. So I don't think what you suggested will work.

However, I think your idea is a good one. The trick is to protect the
model specification from evaluation via quote(). e.g.

> z <-  deparse(quote(lm(y~x1)))
> z
[1] "lm(y ~ x1)"

Then you can apply your suggestion:

> w <-gsub("x1","log(x1)",z)
> w
[1] "lm(y ~ log(x1))"
> eval(parse(text=w))

Call:
lm(formula = y ~ log(x1))

Coefficients:
(Intercept)      log(x1)
   -0.04894      0.36484


The gsub() would make the substitution wherever "x1" appeared in the
model formula, thus fulfilling the OP's request.

Two comments:

1. update() behaves as documented. It is a formula update method, not
a macro substitution procedure.

2. I believe this illustrates a legitimate violation of the "avoid the
eval(parse)) construction" precept. However, I may be wrong about this
and would welcome being corrected and shown a better alternative.

Cheers,
Bert





On Tue, May 29, 2012 at 9:31 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
> Hi Paul,
>
> I haven't quite thought through this yet, but might it not be easier
> to convert your formula to a character and then use gsub et al on it
> directly?
>
> Something like this
>
> # Using m2 as you set up below
> m2 <- lm(y ~ log(x1) + x2*x3, data=dat)
>
> f2 <- formula(m2)
>
> as.formula(paste(f2[2], f2[1],gsub("x1", "x1c", as.character(f2[3]))))
>
> It's admittedly unwieldy, but it seems pretty robust.
>
> Something like:
>
> changeFormula <- function(form, xIn, xOut){
>    as.formula(paste(form[2], form[1], gsub(xIn, xOut, as.character(form[3]))))
> }
>
> changeForm(formula(m2), "x1", "x1c")
>
> I'm not sure if this will play nice with environments and what not so
> you might need to change those manually.
>
> Hope this gets you started,
> Michael
>
> On Tue, May 29, 2012 at 11:43 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>> Greetings
>>
>> I want to take a fitted regression and replace all uses of a variable
>> in a formula. For example, I'd like to take
>>
>> m1 <- lm(y ~ x1, data=dat)
>>
>> and replace x1 with something else, say x1c, so the formula would become
>>
>> m1 <- lm(y ~ x1c, data=dat)
>>
>> I have working code to finish that part of the problem, but it fails
>> when the formula is more complicated. If the formula has log(x1) or
>> x1:x2, the update code I'm testing doesn't get right.
>>
>> Here's the test code:
>>
>> ##PJ
>> ## 2012-05-29
>> dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50),
>> x3=rnorm(100,m=50), y=rnorm(100))
>>
>> m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat)
>> m2 <- lm(y ~ log(x1) + x2*x3, data=dat)
>>
>> suffixX <- function(fmla, x, s){
>>    upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s)))
>>    update.formula(fmla, upform)
>> }
>>
>> newFmla <- formula(m2)
>> newFmla
>> suffixX(newFmla, "x2", "c")
>> suffixX(newFmla, "x1", "c")
>>
>> The last few lines of the output. See how the update misses x1 inside
>> log(x1) or in the interaction?
>>
>>
>>> newFmla <- formula(m2)
>>> newFmla
>> y ~ log(x1) + x2 * x3
>>> suffixX(newFmla, "x2", "c")
>> y ~ log(x1) + x3 + x2c + x2:x3
>>> suffixX(newFmla, "x1", "c")
>> y ~ log(x1) + x2 + x3 + x1c + x2:x3
>>
>> It gets the target if the target is all by itself, but not otherwise.
>>
>> After messing with this for quite a while, I conclude that update was
>> the wrong way to go because it is geared to replacement of individual
>> bits, not editing all instances of a thing.
>>
>> So I started studying the structure of formula objects.  I noticed
>> this really interesting thing. the newFmla object can be probed
>> recursively to eventually reveal all of the individual pieces:
>>
>>
>>> newFmla
>> y ~ log(x1) + x2 * x3
>>> newFmla[[3]]
>> log(x1) + x2 * x3
>>> newFmla[[3]][[2]]
>> log(x1)
>>> newFmla[[3]][[2]][[2]]
>> x1
>>
>> So, if you could tell me of a general way to "walk" though a formula
>> object, couldn't I use "gsub" or something like that to recognize each
>> instance of "x1" and replace with "x1c"??
>>
>> I just can't figure how to automate the checking of each possible
>> element in a formula, to get the right combination of [[]][[]][[]].
>> See what I mean? I need to avoid this:
>>
>>> newFmla[[3]][[2]][[3]]
>> Error in newFmla[[3]][[2]][[3]] : subscript out of bounds
>>
>> pj
>>
>> --
>> Paul E. Johnson
>> Professor, Political Science    Assoc. Director
>> 1541 Lilac Lane, Room 504     Center for Research Methods
>> University of Kansas               University of Kansas
>> http://pj.freefaculty.org            http://quant.ku.edu
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list