[R] trouble automating formula edits when log or * are present; update trouble

Tue May 29 19:51:07 CEST 2012

Deparse... that's it -- was disappointed with having to turn
as.character.formula inside out once and again. Merci!

But, as always, we all loose to Gabor ;-)

Michael

On Tue, May 29, 2012 at 1:16 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> I should have added:
>
> If the formula is just assigned to a name, quote() and
> eval(parse(...)) are not needed:
>
> fm1 <-  y ~ x1  ## a formula
> w <- gsub( "x1","log(x1)", deparse(fm1))
> fm2 <- formula(w)
>
> This is probably the btter way to do it.
>
> -- Bert
>
>
>
> On Tue, May 29, 2012 at 10:01 AM, Bert Gunter <bgunter at gene.com> wrote:
>> Michael:
>>
>> m2 is a model fit, not a formula. So I don't think what you suggested will work.
>>
>> However, I think your idea is a good one. The trick is to protect the
>> model specification from evaluation via quote(). e.g.
>>
>>> z <-  deparse(quote(lm(y~x1)))
>>> z
>> [1] "lm(y ~ x1)"
>>
>> Then you can apply your suggestion:
>>
>>> w <-gsub("x1","log(x1)",z)
>>> w
>> [1] "lm(y ~ log(x1))"
>>> eval(parse(text=w))
>>
>> Call:
>> lm(formula = y ~ log(x1))
>>
>> Coefficients:
>> (Intercept)      log(x1)
>>   -0.04894      0.36484
>>
>>
>> The gsub() would make the substitution wherever "x1" appeared in the
>> model formula, thus fulfilling the OP's request.
>>
>> Two comments:
>>
>> 1. update() behaves as documented. It is a formula update method, not
>> a macro substitution procedure.
>>
>> 2. I believe this illustrates a legitimate violation of the "avoid the
>> eval(parse)) construction" precept. However, I may be wrong about this
>> and would welcome being corrected and shown a better alternative.
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>>
>> On Tue, May 29, 2012 at 9:31 AM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>>> Hi Paul,
>>>
>>> I haven't quite thought through this yet, but might it not be easier
>>> to convert your formula to a character and then use gsub et al on it
>>> directly?
>>>
>>> Something like this
>>>
>>> # Using m2 as you set up below
>>> m2 <- lm(y ~ log(x1) + x2*x3, data=dat)
>>>
>>> f2 <- formula(m2)
>>>
>>> as.formula(paste(f2[2], f2[1],gsub("x1", "x1c", as.character(f2[3]))))
>>>
>>> It's admittedly unwieldy, but it seems pretty robust.
>>>
>>> Something like:
>>>
>>> changeFormula <- function(form, xIn, xOut){
>>>    as.formula(paste(form[2], form[1], gsub(xIn, xOut, as.character(form[3]))))
>>> }
>>>
>>> changeForm(formula(m2), "x1", "x1c")
>>>
>>> I'm not sure if this will play nice with environments and what not so
>>> you might need to change those manually.
>>>
>>> Hope this gets you started,
>>> Michael
>>>
>>> On Tue, May 29, 2012 at 11:43 AM, Paul Johnson <pauljohn32 at gmail.com> wrote:
>>>> Greetings
>>>>
>>>> I want to take a fitted regression and replace all uses of a variable
>>>> in a formula. For example, I'd like to take
>>>>
>>>> m1 <- lm(y ~ x1, data=dat)
>>>>
>>>> and replace x1 with something else, say x1c, so the formula would become
>>>>
>>>> m1 <- lm(y ~ x1c, data=dat)
>>>>
>>>> I have working code to finish that part of the problem, but it fails
>>>> when the formula is more complicated. If the formula has log(x1) or
>>>> x1:x2, the update code I'm testing doesn't get right.
>>>>
>>>> Here's the test code:
>>>>
>>>> ##PJ
>>>> ## 2012-05-29
>>>> dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50),
>>>> x3=rnorm(100,m=50), y=rnorm(100))
>>>>
>>>> m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat)
>>>> m2 <- lm(y ~ log(x1) + x2*x3, data=dat)
>>>>
>>>> suffixX <- function(fmla, x, s){
>>>>    upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s)))
>>>>    update.formula(fmla, upform)
>>>> }
>>>>
>>>> newFmla <- formula(m2)
>>>> newFmla
>>>> suffixX(newFmla, "x2", "c")
>>>> suffixX(newFmla, "x1", "c")
>>>>
>>>> The last few lines of the output. See how the update misses x1 inside
>>>> log(x1) or in the interaction?
>>>>
>>>>
>>>>> newFmla <- formula(m2)
>>>>> newFmla
>>>> y ~ log(x1) + x2 * x3
>>>>> suffixX(newFmla, "x2", "c")
>>>> y ~ log(x1) + x3 + x2c + x2:x3
>>>>> suffixX(newFmla, "x1", "c")
>>>> y ~ log(x1) + x2 + x3 + x1c + x2:x3
>>>>
>>>> It gets the target if the target is all by itself, but not otherwise.
>>>>
>>>> After messing with this for quite a while, I conclude that update was
>>>> the wrong way to go because it is geared to replacement of individual
>>>> bits, not editing all instances of a thing.
>>>>
>>>> So I started studying the structure of formula objects.  I noticed
>>>> this really interesting thing. the newFmla object can be probed
>>>> recursively to eventually reveal all of the individual pieces:
>>>>
>>>>
>>>>> newFmla
>>>> y ~ log(x1) + x2 * x3
>>>>> newFmla[[3]]
>>>> log(x1) + x2 * x3
>>>>> newFmla[[3]][[2]]
>>>> log(x1)
>>>>> newFmla[[3]][[2]][[2]]
>>>> x1
>>>>
>>>> So, if you could tell me of a general way to "walk" though a formula
>>>> object, couldn't I use "gsub" or something like that to recognize each
>>>> instance of "x1" and replace with "x1c"??
>>>>
>>>> I just can't figure how to automate the checking of each possible
>>>> element in a formula, to get the right combination of [[]][[]][[]].
>>>> See what I mean? I need to avoid this:
>>>>
>>>>> newFmla[[3]][[2]][[3]]
>>>> Error in newFmla[[3]][[2]][[3]] : subscript out of bounds
>>>>
>>>> pj
>>>>
>>>> --
>>>> Paul E. Johnson
>>>> Professor, Political Science    Assoc. Director
>>>> 1541 Lilac Lane, Room 504     Center for Research Methods
>>>> University of Kansas               University of Kansas
>>>> http://pj.freefaculty.org            http://quant.ku.edu
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm