[R] [Rd] gregexpr - match overlap mishandled (PR#13391)

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Sun Dec 14 13:39:12 CET 2008


Greg Snow wrote:
> Controlling the pointer is going to be very different from perl since the R functions are vectorized rather than focusing on a single string.
>
> Here is one approach that will give all the matches and lengths (for the original problem at least):
>
>   
>> mystr <- paste(rep("1122", 10), collapse="")
>> n <- nchar(mystr)
>>
>> mystr2 <- substr(rep(mystr,n), 1:n, n)
>>
>> tmp <- regexpr("^11221122", mystr2)
>> (tmp + 1:n - 1)[tmp>0]
>>     
> [1]  1  5  9 13 17 21 25 29 33
>   
>> attr(tmp,"match.length")[tmp>0]
>>     
> [1] 8 8 8 8 8 8 8 8 8
>
>   

while not exactly what i meant, this is an implementation of one of the
approaches mentioned below, ith care taken not to report duplicate matches:

>> sequentially perform single matches on successive substrings of the
>> input string (which can give you the same match more than once,
>> though).  

one issue with your solution is that it allocates n substrings at the
same time, which requires O(n^2) space (with n the length of the
original string), but it may be faster than a for loop matching one
substring at a time.

vQ



More information about the R-help mailing list