[R] vectorized sub, gsub, grep, etc.

Christos Hatzis christos at nuverabio.com
Thu Oct 9 06:11:25 CEST 2008


Hi John,

As I mentioned in our private exchange, this is well known in R, i.e.
vectorized versions are not always faster or more efficient than straight
loops.  It is a misconception that loops should be avoided at any cost.  See
John Fox's illuminating article on Rnews (p. 46) on this subject.
http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf

To me, unless the application is way too demanding, I take the vectorized
version over the loop any day, as it is much simpler to write, usually a
one-liner, and therefore much easier to maintain in the long run.

-Christos 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of john
> Sent: Thursday, October 09, 2008 12:38 AM
> To: r-help
> Subject: Re: [R] vectorized sub, gsub, grep, etc.
> 
> Hello Christos,
>   To my surprise, vectorization actually hurt processing speed!
> 
> #Example
> X <- c("ab", "cd", "ef")
> patt <- c("b", "cd", "a")
> repl <- c("B", "CD", "A")
> 
> sub2 <- function(pattern, replacement, x) {
>     len <- length(x)
>     if (length(pattern) == 1) 
>         pattern <- rep(pattern, len)
>     if (length(replacement) == 1) 
>         replacement <- rep(replacement, len)
>     FUN <- function(i, ...) {
>         sub(pattern[i], replacement[i], x[i], fixed = TRUE)
>     }
>     idx <- 1:length(x)
>     sapply(idx, FUN)    
> }
>  
> system.time(  for(i in 1:10000)  sub2(patt, repl, X)  )
>    user  system elapsed 
>    1.18    0.07    1.26 
> 
> system.time(  for(i in 1:10000)  mapply(function(p, r, x) 
> sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X)  )
>    user  system elapsed 
>    1.42    0.05    1.47 
>  
> So much for avoiding loops.
> John Thaden
> 
> ======= At 2008-10-07, 14:58:10 Christos wrote: =======
> 
> >John,
> >Try the following:
> >
> > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), 
> p=patt, r=repl, x=X)
> >   b   cd    a 
> >"aB" "CD" "ef"  
> >
> >-Christos
> 
> >> -----My Original Message-----
> >> R pattern-matching and replacement functions are
> >> vectorized: they can operate on vectors of targets.
> >> However, they can only use one pattern and replacement.
> >> Here is code to apply a different pattern and replacement 
> for every 
> >> target.  My question: can it be done better?
> >> 
> >> sub2 <- function(pattern, replacement, x) {
> >>     len <- length(x)
> >>     if (length(pattern) == 1) 
> >>         pattern <- rep(pattern, len)
> >>     if (length(replacement) == 1) 
> >>         replacement <- rep(replacement, len)
> >>     FUN <- function(i, ...) {
> >>         sub(pattern[i], replacement[i], x[i], fixed = TRUE)
> >>     }
> >>     idx <- 1:length(x)
> >>     sapply(idx, FUN)    
> >> }
> >> 
> >> #Example
> >> X <- c("ab", "cd", "ef")
> >> patt <- c("b", "cd", "a")
> >> repl <- c("B", "CD", "A")
> >> sub2(patt, repl, X)
> >> 
> >> -John
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-help mailing list