[R] split strings

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed May 27 10:07:42 CEST 2009


Allan Engelhardt wrote:
> Immaterial, yes, but it is always good to test :) and your solution
> *is* faster and it is even faster if you can assume byte strings:

:)

indeed;  though if the speed is immaterial (and in this case it
supposedly was), it's probably not worth risking fixed=TRUE removing
'.tif' from the middle of the name, however unlikely this might be (cf
murphy's laws).

but if you can assume that each string ends with a '.tif' (or any other
\..{3} substring), then substr is marginally faster than sub, even as a
three-pass approach, while avoiding the risk of removing '.tif' from the
middle:

    strings = sprintf('f:/foo/bar//%s.tif', replicate(1000,
paste(sample(letters, 10), collapse='')))
    library(rbenchmark)
    benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
       substr={basenames=basename(strings); substr(basenames, 1,
nchar(basenames)-4)},
       sub=sub('.tif', '', basename(strings), fixed=TRUE, useBytes=TRUE))
    #     test elapsed
    # 1 substr   3.176
    # 2    sub   3.296


vQ

>
> > strings = sprintf('f:/foo/bar//%s.tif', replicate(1000,
> paste(sample(letters, 10), collapse='')))
> > library(rbenchmark)
> > benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
>   'one-pass, perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=TRUE),
>   'two-pass, perl'=sub('.tif$', '', basename(strings), perl=TRUE),
>   'one-pass, no perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=FALSE),
>   'two-pass, no perl'=sub('.tif$', '', basename(strings), perl=FALSE),
>   'fixed'=sub(".tif", "", basename(strings), fixed=TRUE),
>   'fixed, bytes'=sub(".tif", "", basename(strings), fixed=TRUE,
> useBytes=TRUE))
>
>               test elapsed
> 1    one-pass, perl   2.946
> 2    two-pass, perl   3.858
> 3 one-pass, no perl  15.884
> 4 two-pass, no perl   3.788
> 5             fixed   2.264
> 6      fixed, bytes   1.813
>




More information about the R-help mailing list