[R] split strings

Tue May 26 22:24:59 CEST 2009

Monica Pisica wrote:
> Hi everybody,
>  
> Thank you for the suggestions and especially the explanation Waclaw provided for his code. Maybe one day i will be able to wrap my head around this.
>  
> Thanks again,
>   

you're welcome.  note that if efficiency is an issue, you'd better have
perl=TRUE there:

    output = sub('.*//(.*)[.]tif$', '\\1', input, perl=TRUE)

with perl=TRUE, the one-pass solution is somewhat faster than the
two-pass solution of gabor's -- which, however, is probably easier to
understand;  with perl=FALSE (the default), the performance drops:

    strings = sprintf(
        'f:/foo/bar//%s.tif',
        replicate(1000, paste(sample(letters, 10), collapse='')))
    library(rbenchmark)
    benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
       'one-pass, perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=TRUE),
       'two-pass, perl'=sub('.tif$', '', basename(strings), perl=TRUE),
       'one-pass, no perl'=sub('.*//(.*)[.]tif$', '\\1', strings,
perl=FALSE),
       'two-pass, no perl'=sub('.tif$', '', basename(strings), perl=FALSE))
    # 1    one-pass, perl   3.391
    # 2    two-pass, perl   4.944
    # 3 one-pass, no perl  18.836
    # 4 two-pass, no perl   5.191

vQ

>  
> Monica
>
> ----------------------------------------
>   
>> Date: Tue, 26 May 2009 15:46:21 +0200
>> From: Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
>> To: pisicandru at hotmail.com
>> CC: r-help at r-project.org
>> Subject: Re: [R] split strings
>>
>> Monica Pisica wrote:
>>     
>>> Hi everybody,
>>>
>>> I have a vector of characters and i would like to extract certain parts. My vector is named metr_list:
>>>
>>> [1] "F:/Naval_Live_Oaks/2005/data//BE.tif"
>>> [2] "F:/Naval_Live_Oaks/2005/data//CH.tif"
>>> [3] "F:/Naval_Live_Oaks/2005/data//CRR.tif"
>>> [4] "F:/Naval_Live_Oaks/2005/data//HOME.tif"
>>>
>>> And i would like to extract BE, CH, CRR, and HOME in a different vector named "names.id"
>>>       
>> one way that seems reasonable is to use sub:
>>
>> output = sub('.*//(.*)[.]tif$', '\\1', input)
>>
>> which says 'from each string remember the substring between the
>> rigthmost two slashes and a .tif extension, exclusive, and replace the
>> whole thing with the captured part'. if the pattern does not match, you
>> get the original input:
>>
>> sub('.*//(.*)[.]tif$', '\\1', 'f:/foo/bar//buz.tif')
>> # buz
>>
>> vQ
>>     
> _________________________________________________________________