[R] trouble automating formula edits when log or * are present; update trouble

Bert Gunter gunter.berton at gene.com
Tue May 29 19:16:21 CEST 2012


I should have added:

If the formula is just assigned to a name, quote() and
eval(parse(...)) are not needed:

fm1 <-  y ~ x1  ## a formula
w <- gsub( "x1","log(x1)", deparse(fm1))
fm2 <- formula(w)

This is probably the btter way to do it.

-- Bert



On Tue, May 29, 2012 at 10:01 AM, Bert Gunter <bgunter at gene.com> wrote:
> Michael:
>
> m2 is a model fit, not a formula. So I don't think what you suggested will work.
>
> However, I think your idea is a good one. The trick is to protect the
> model specification from evaluation via quote(). e.g.
>
>> z <-  deparse(quote(lm(y~x1)))
>> z
> [1] "lm(y ~ x1)"
>
> Then you can apply your suggestion:
>
>> w <-gsub("x1","log(x1)",z)
>> w
> [1] "lm(y ~ log(x1))"
>> eval(parse(text=w))
>
> Call:
> lm(formula = y ~ log(x1))
>
> Coefficients:
> (Intercept)      log(x1)
>   -0.04894      0.36484
>
>
> The gsub() would make the substitution wherever "x1" appeared in the
> model formula, thus fulfilling the OP's request.
>
> Two comments:
>
> 1. update() behaves as documented. It is a formula update method, not
> a macro substitution procedure.
>
> 2. I believe this illustrates a legitimate violation of the "avoid the
> eval(parse)) construction" precept. However, I may be wrong about this
> and would welcome being corrected and shown a better alternative.
>
> Cheers,
> Bert
>
>
>
>
>
> On Tue, May 29, 2012 at 9:31 AM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>> Hi Paul,
>>
>> I haven't quite thought through this yet, but might it not be easier
>> to convert your formula to a character and then use gsub et al on it
>> directly?
>>
>> Something like this
>>
>> # Using m2 as you set up below
>> m2 <- lm(y ~ log(x1) + x2*x3, data=dat)
>>
>> f2 <- formula(m2)
>>
>> as.formula(paste(f2[2], f2[1],gsub("x1", "x1c", as.character(f2[3]))))
>>
>> It's admittedly unwieldy, but it seems pretty robust.
>>
>> Something like:
>>
>> changeFormula <- function(form, xIn, xOut){
>>    as.formula(paste(form[2], form[1], gsub(xIn, xOut, as.character(form[3]))))
>> }
>>
>> changeForm(formula(m2), "x1", "x1c")
>>
>> I'm not sure if this will play nice with environments and what not so
>> you might need to change those manually.
>>
>> Hope this gets you started,
>> Michael
>>
>> On Tue, May 29, 2012 at 11:43 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>>> Greetings
>>>
>>> I want to take a fitted regression and replace all uses of a variable
>>> in a formula. For example, I'd like to take
>>>
>>> m1 <- lm(y ~ x1, data=dat)
>>>
>>> and replace x1 with something else, say x1c, so the formula would become
>>>
>>> m1 <- lm(y ~ x1c, data=dat)
>>>
>>> I have working code to finish that part of the problem, but it fails
>>> when the formula is more complicated. If the formula has log(x1) or
>>> x1:x2, the update code I'm testing doesn't get right.
>>>
>>> Here's the test code:
>>>
>>> ##PJ
>>> ## 2012-05-29
>>> dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50),
>>> x3=rnorm(100,m=50), y=rnorm(100))
>>>
>>> m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat)
>>> m2 <- lm(y ~ log(x1) + x2*x3, data=dat)
>>>
>>> suffixX <- function(fmla, x, s){
>>>    upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s)))
>>>    update.formula(fmla, upform)
>>> }
>>>
>>> newFmla <- formula(m2)
>>> newFmla
>>> suffixX(newFmla, "x2", "c")
>>> suffixX(newFmla, "x1", "c")
>>>
>>> The last few lines of the output. See how the update misses x1 inside
>>> log(x1) or in the interaction?
>>>
>>>
>>>> newFmla <- formula(m2)
>>>> newFmla
>>> y ~ log(x1) + x2 * x3
>>>> suffixX(newFmla, "x2", "c")
>>> y ~ log(x1) + x3 + x2c + x2:x3
>>>> suffixX(newFmla, "x1", "c")
>>> y ~ log(x1) + x2 + x3 + x1c + x2:x3
>>>
>>> It gets the target if the target is all by itself, but not otherwise.
>>>
>>> After messing with this for quite a while, I conclude that update was
>>> the wrong way to go because it is geared to replacement of individual
>>> bits, not editing all instances of a thing.
>>>
>>> So I started studying the structure of formula objects.  I noticed
>>> this really interesting thing. the newFmla object can be probed
>>> recursively to eventually reveal all of the individual pieces:
>>>
>>>
>>>> newFmla
>>> y ~ log(x1) + x2 * x3
>>>> newFmla[[3]]
>>> log(x1) + x2 * x3
>>>> newFmla[[3]][[2]]
>>> log(x1)
>>>> newFmla[[3]][[2]][[2]]
>>> x1
>>>
>>> So, if you could tell me of a general way to "walk" though a formula
>>> object, couldn't I use "gsub" or something like that to recognize each
>>> instance of "x1" and replace with "x1c"??
>>>
>>> I just can't figure how to automate the checking of each possible
>>> element in a formula, to get the right combination of [[]][[]][[]].
>>> See what I mean? I need to avoid this:
>>>
>>>> newFmla[[3]][[2]][[3]]
>>> Error in newFmla[[3]][[2]][[3]] : subscript out of bounds
>>>
>>> pj
>>>
>>> --
>>> Paul E. Johnson
>>> Professor, Political Science    Assoc. Director
>>> 1541 Lilac Lane, Room 504     Center for Research Methods
>>> University of Kansas               University of Kansas
>>> http://pj.freefaculty.org            http://quant.ku.edu
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list