[R] gsub regex simplification

Gabor Grothendieck ggrothendieck at gmail.com
Wed May 8 12:51:22 CEST 2013


On Wed, May 8, 2013 at 5:08 AM, Thaler,Thorn,LAUSANNE,Applied
Mathematics <Thorn.Thaler at rdls.nestle.com> wrote:
> Dear all,
>
> I want to use gsub to change a vector of strings. Basically, I want to replace any dot by a space, remove the possibly appended ".f" and I want to capitalize each word. I did that by chaining multiple gsubs together, but I was wondering (for the sake of learning - maybe the current version is more readable) whether I could do that with a _single_ gsub call?
>
> Thanks for your help!
>
> txt <- c("example1", "example2.f", "another.example3.f", "yet.another.example4.f")
> gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2",
>        gsub("\\.", " ", gsub("\\.f", "", txt)),
>        perl = TRUE)
>

gsubfn() in the gsubfn package is like gsub except that the
replacement string can be a user function, list or proto object.
Using a function one call to gsubfn could do it:  Note that the user
function's i-th argument captures the i-th back reference and the
function's output is used to replace its input so:

library(gsubfn)
f <- function(x1, x2, x3) {
   if (x1 != "") return("")
   if (x2 != "") return(" ")
   if (x3 != "") return(toupper(x3))
}
gsubfn("(.f$)|(\\.)|(\\b.)", f, txt, perl = TRUE)

The last line gives:

[1] "Example1" "Example2" "Another Example3" "Yet Another Example4"

See http://gsubfn.googlecode.com


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list