[R] [Rd] gregexpr - match overlap mishandled (PR#13391)
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Sun Dec 14 13:39:12 CET 2008
Greg Snow wrote:
> Controlling the pointer is going to be very different from perl since the R functions are vectorized rather than focusing on a single string.
>
> Here is one approach that will give all the matches and lengths (for the original problem at least):
>
>
>> mystr <- paste(rep("1122", 10), collapse="")
>> n <- nchar(mystr)
>>
>> mystr2 <- substr(rep(mystr,n), 1:n, n)
>>
>> tmp <- regexpr("^11221122", mystr2)
>> (tmp + 1:n - 1)[tmp>0]
>>
> [1] 1 5 9 13 17 21 25 29 33
>
>> attr(tmp,"match.length")[tmp>0]
>>
> [1] 8 8 8 8 8 8 8 8 8
>
>
while not exactly what i meant, this is an implementation of one of the
approaches mentioned below, ith care taken not to report duplicate matches:
>> sequentially perform single matches on successive substrings of the
>> input string (which can give you the same match more than once,
>> though).
one issue with your solution is that it allocates n substrings at the
same time, which requires O(n^2) space (with n the length of the
original string), but it may be faster than a for loop matching one
substring at a time.
vQ
More information about the R-help
mailing list