[Rd] [R] split strings
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu May 28 15:50:33 CEST 2009
William Dunlap wrote:
>
> Would your patched code affect the following
> use of regexpr's output as input to substr, to
> pull out the matched text from the string?
> > x<-c("ooo","good food","bad")
> > r<-regexpr("o+", x)
> > substring(x,r,attr(r,"match.length")+r-1)
> [1] "ooo" "oo" ""
>
no; same output
> > substr(x,r,attr(r,"match.length")+r-1)
> [1] "ooo" "oo" ""
>
no; same output
> > r
> [1] 1 2 -1
> attr(,"match.length")
> [1] 3 2 -1
> > attr(r,"match.length")+r-1
> [1] 3 3 -3
> attr(,"match.length")
> [1] 3 2 -1
>
for the positive indices there is no change, as you might expect.
if i understand your concern, the issue is that regexpr returns -1 (with
the corresponding attribute -1) where there is no match. in this case,
you expect "" as the substring.
if there is no match, we have:
start = r = -1 (the start you index provide)
stop = attr(r) + r - 1 = -1 + -1 -1 = -3 (the stop index you provide)
for a string of length n, my patch computes the final indices as follows:
start' = n + start - 1
stop' = n + stop - 1
whatever the value of n, stop' - start' = stop - start = -3 - 1 = -4.
that is, stop' < start', hence an empty string is returned, by virtue of
the original code. (see the sources for details.)
does this answer your question?
vQ
More information about the R-devel
mailing list