[R] Substring replacement in string

Hervé Pagès hpages at fredhutch.org
Sun Mar 1 10:38:25 CET 2015


Hi Alrik,

On 02/28/2015 11:06 PM, Alrik Thiem wrote:
> Dear Hervé,
>
> Many thanks for your suggestion. Gabor Grothendieck proposed a simple
> one-liner that works perfectly for my purposes:
>
> gsub("(\\b[a-oq-z][a-z0-9]*)", "1-\\U\\1", x, perl = TRUE)
>
> where x is the respective string.

Sounds good. I didn't realize that you also wanted to prefix the lower
case letters with "1 - " so my solution was not doing the right thing
anyway. Here is the corrected solution, just in case:

   library(Biostrings)

   funnyReplace <- function(x, protected_words)
   {
     x <- BString(x)

     ## Extract the substrings to modify (target substrings).
     protected_regions <- reduce(do.call("c", lapply(protected_words, 
matchPattern, x)))
     target_regions <- ranges(gaps(protected_regions))
     target_substrings <- extractAt(x, target_regions)

     ## Modify them (using a reg exp almost like Gabbor's except
     ## that "p" is not treated as an exception).
     target_substrings <- gsub("(\\b[a-z][a-z0-9]*)", "1 - \\U\\1", 
target_substrings, perl=TRUE)

     ## Replace in original string.
     x <- replaceAt(x, target_regions, target_substrings)
     as.character(x)
}

Then:

   > x <- "pmin(pmax(pmin(x1, X2), pmin(X3, X4)) == Y, pmax(Z1, z1))"
   > funnyReplace(x, c("pmin", "pmax"))
   [1] "pmin(pmax(pmin(1 - X1, X2), pmin(X3, X4)) == Y, pmax(Z1, 1 - Z1))"

It works even if a variable name starts with a "p":

   > funnyReplace("pmin(p)", c("pmin", "pmax"))
   [1] "pmin(1 - P)"

and you can specify an arbitrary number of protected words.

Cheers,
H.

>
> Best wishes,
> Alrik
>
> -----Ursprüngliche Nachricht-----
> Von: Hervé Pagès [mailto:hpages at fredhutch.org]
> Gesendet: Samstag, 28. Februar 2015 23:29
> An: Alrik Thiem; r-help at r-project.org
> Betreff: Re: [R] Substring replacement in string
>
> Hi Alrik,
>
> With the Biostrings/IRanges infrastructure (Bioconductor packages), you
> can do this with:
>
>     library(Biostrings)
>     x0 <- BString("pmin(pmax(pmin(x1, X2), pmin(X3, X4)) == Y, pmax(Z1,
> z1))")
>     donttouch_words <- c("pmin", "pmax")
>
>     ## Extract the substrings to modify (target substrings).
>     donttouch_regions <- reduce(do.call("c", lapply(donttouch_words,
> matchPattern, x0)))
>     target_regions <- ranges(gaps(donttouch_regions))
>     target_substrings <- extractAt(x0, target_regions)
>
>     ## Modify them.
>     old <- paste0(letters, collapse="")
>     new <- paste0(LETTERS, collapse="")
>     target_substrings <- chartr(old, new, target_substrings)
>
>     ## Replace in original string.
>     x1 <- replaceAt(x0, target_regions, target_substrings)
>
> Then:
>
>     > x1
>       57-letter "BString" instance
>     seq: pmin(pmax(pmin(X1, X2), pmin(X3, X4)) == Y, pmax(Z1, Z1))
>
>     > as.character(x1)
>     [1] "pmin(pmax(pmin(X1, X2), pmin(X3, X4)) == Y, pmax(Z1, Z1))"
>
> Hope this helps,
> H.
>
> On 02/27/2015 02:19 PM, Alrik Thiem wrote:
>> Dear R-help list,
>>
>> I would like to replace all lower-case letters in a string that are not
> part
>> of certain fixed expressions. For example, I have the string:
>>
>> "pmin(pmax(pmin(x1, X2), pmin(X3, X4)) == Y, pmax(Z1, z1))"
>>
>> Where I would like to replace all lower-case letters that do not belong to
>> the functions "pmin" and "pmax" by 1 - toupper(...) to get
>>
>> "pmin(pmax(pmin(1 - X1, X2), pmin(X3, X4)) == Y, pmax(Z1, 1 - Z1))"
>>
>> Any ideas on how I could achieve that?
>>
>> Many thanks and best wishes,
>>
>> Alrik
>>
>>
>> ********************************
>> Alrik Thiem
>> Post-Doctoral Researcher
>>
>> Department of Philosophy
>> University of Geneva
>> Rue de Candolle 2
>> CH-1211 Geneva
>>
>> +41 76 527 80 83
>>
>> http://www.alrik-thiem.net
>> http://www.compasss.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-help mailing list