[R] how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?

Gabor Grothendieck ggrothendieck at gmail.com
Sun Dec 20 13:02:47 CET 2009


On Sun, Dec 20, 2009 at 5:33 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Use a zero lookaround expression.  It will not consume its match.  See ?regexp

That should be lookahead, not lookaround.

>
>> gregexpr("a(?=a)", "aaa", perl = TRUE)
> [[1]]
> [1] 1 2
> attr(,"match.length")
> [1] 1 1
>
>
> On Sun, Dec 20, 2009 at 1:43 AM, Jonathan <jonsleepy at gmail.com> wrote:
>> Last one for you guys:
>>
>> The command:
>>
>> length(gregexpr('cus','hocus pocus')[[1]])
>> [1] 2
>>
>> returns the number of times the substring 'cus' appears in 'hocus pocus'
>> (which is two)
>>
>> It's returning the number of **disjoint** matches.  So:
>>
>> length(gregexpr('aa','aaa')[[1]])
>>  [1] 1
>>
>> returns 1.
>>
>> **What I want to do:**
>> I'm looking for a way to count all occurrences of the substring, including
>> overlapping sets (so 'aa' would be found in 'aaa' two times, because the
>> middle 'a' gets counted twice).
>>
>> Any ideas would be much appreciated!!
>>
>> Signing off and thanks for all the great assistance,
>> Jonathan
>




More information about the R-help mailing list