[Rd] [R] split strings

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu May 28 15:50:33 CEST 2009


William Dunlap wrote:
>
> Would your patched code affect the following
> use of regexpr's output as input to substr, to
> pull out the matched text from the string?
>    > x<-c("ooo","good food","bad")
>    > r<-regexpr("o+", x)
>    > substring(x,r,attr(r,"match.length")+r-1)
>    [1] "ooo" "oo"  ""   
>   

no; same output

>    > substr(x,r,attr(r,"match.length")+r-1)
>    [1] "ooo" "oo"  ""   
>   

no; same output

>    > r
>    [1]  1  2 -1
>    attr(,"match.length")
>    [1]  3  2 -1
>    > attr(r,"match.length")+r-1
>    [1]  3  3 -3
>    attr(,"match.length")
>    [1]  3  2 -1
>   

for the positive indices there is no change, as you might expect.

if i understand your concern, the issue is that regexpr returns -1 (with
the corresponding attribute -1) where there is no match.  in this case,
you expect "" as the substring. 

if there is no match, we have:

    start = r = -1 (the start you index provide)
    stop = attr(r) + r - 1 = -1 + -1 -1 = -3 (the stop index you provide)

for a string of length n, my patch computes the final indices as follows:

    start' = n + start - 1
    stop' = n + stop - 1

whatever the value of n, stop' - start' = stop - start = -3 - 1 = -4. 
that is, stop' < start', hence an empty string is returned, by virtue of
the original code.  (see the sources for details.)

does this answer your question?

vQ



More information about the R-devel mailing list