[R] how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
Gabor Grothendieck
ggrothendieck at gmail.com
Sun Dec 20 16:04:20 CET 2009
Try this:
> findall("(.).\\1", "ababacababab")
[1] 1 2 3 5 7 8 9 10
> gregexpr("(.)(?=.\\1)", "ababacababab", perl = TRUE)
[[1]]
[1] 1 2 3 5 7 8 9 10
attr(,"match.length")
[1] 1 1 1 1 1 1 1 1
On Sun, Dec 20, 2009 at 9:33 AM, Hans W Borchers
<hwborchers at googlemail.com> wrote:
> Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:
>>
>> Try this:
>>
>> > findall("aba", "ababacababab")
>> [1] 1 3 7 9
>> > gregexpr("a(?=ba)", "ababacababab", perl = TRUE)
>> [[1]]
>> [1] 1 3 7 9
>> attr(,"match.length")
>> [1] 1 1 1 1
>>
>> > findall("a.a", "ababacababab")
>> [1] 1 3 5 7 9
>> > gregexpr("a(?=.a)", "ababacababab", perl = TRUE)
>> [[1]]
>> [1] 1 3 5 7 9
>> attr(,"match.length")
>> [1] 1 1 1 1 1
>
>
> Thanks --- somehow I did not realize that the expression in "?=..."
> can also be regular.
>
> My original problem was to find all three character matches where the
> first and the last one are the same. With findall() it works like:
>
> findall("(.).\\1", "ababacababab")
> # [1] 1 2 3 5 7 8 9 10
>
> I am still not able to reproduce this with lookahead. Attempts with
>
> gregexpr("(.)?=.\\1", "ababacababab", perl = TRUE)
>
> do not work as the lookahead expression apparently does not know about
> the captured group from before.
>
> Regards
> Hans Werner
>
> Correction: I meant the '\G' metacharacter in Perl, not a modifier.
>
>
>> On Sun, Dec 20, 2009 at 7:22 AM, Hans W Borchers
>> <hwborchers <at> googlemail.com> wrote:
>> > Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:
>> >
>> > [Sorry; Gmane forces me to delete "more quoted text".]
>> >
>> > ----
>> > findall <- function(apat, atxt) {
>> > stopifnot(length(apat) == 1, length(atxt) == 1)
>> > pos <- c() # positions of matches
>> > i <- 1; n <- nchar(atxt)
>> > found <- regexpr(apat, substr(atxt, i, n), perl=TRUE)
>> > while (found > 0) {
>> > pos <- c(pos, i + found - 1)
>> > i <- i + found
>> > found <- regexpr(apat, substr(atxt, i, n), perl=TRUE)
>> > }
>> > return(pos)
>> > }
>> > ----
>> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list