[Rd] [R] split strings
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu May 28 16:05:28 CEST 2009
Wacek Kusnierczyk wrote:
> William Dunlap wrote:
>
>> Would your patched code affect the following
>> use of regexpr's output as input to substr, to
>> pull out the matched text from the string?
>> > x<-c("ooo","good food","bad")
>> > r<-regexpr("o+", x)
>> > substring(x,r,attr(r,"match.length")+r-1)
>> [1] "ooo" "oo" ""
>>
>>
>
> no; same output
>
>
>> > substr(x,r,attr(r,"match.length")+r-1)
>> [1] "ooo" "oo" ""
>>
>>
>
> no; same output
>
>
>> > r
>> [1] 1 2 -1
>> attr(,"match.length")
>> [1] 3 2 -1
>> > attr(r,"match.length")+r-1
>> [1] 3 3 -3
>> attr(,"match.length")
>> [1] 3 2 -1
>>
>>
>
> for the positive indices there is no change, as you might expect.
>
> if i understand your concern, the issue is that regexpr returns -1 (with
> the corresponding attribute -1) where there is no match. in this case,
> you expect "" as the substring.
>
> if there is no match, we have:
>
> start = r = -1 (the start you index provide)
> stop = attr(r) + r - 1 = -1 + -1 -1 = -3 (the stop index you provide)
>
> for a string of length n, my patch computes the final indices as follows:
>
> start' = n + start - 1
> stop' = n + stop - 1
>
> whatever the value of n, stop' - start' = stop - start = -3 - 1 = -4.
>
except for that stop - start = -3 - -1 = -2, but that's still negative,
i.e., stop' < start'.
silly me, sorry.
vQ
> that is, stop' < start', hence an empty string is returned, by virtue of
> the original code. (see the sources for details.)
>
> does this answer your question?
>
>
More information about the R-devel
mailing list