[R] matching a sequence in a vector?

Martin Morgan mtmorgan at fhcrc.org
Wed Feb 15 15:27:01 CET 2012


On 02/14/2012 11:45 PM, Petr Savicky wrote:
> On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
>> Hi All,
>>
>>
>> I've been trawling through the documentation and listserv archives on this topic -- but
>> as yet have not found a solution.  I'm sure this is pretty simple with R, but I cannot work out how without
>> resorting to ugly nested loops.
>>
>> As far as I can tell, grep, match, and %in% are not the correct tools.
>>
>> Question:
>> given these vectors --
>> patrn<- c(1,2,3,4)
>> exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
>>
>> how do I get the desired answer by finding the occurence of the pattern and returning the starting indices:
>> 6, 13, 23

match(exmpl, patrn) returns indexes that differ by 1 if the sequence 
patrn occurs

   n = length(patrn)
   r = rle(diff(match(exmpl, patrn)) == 1)

we're looking for a run of TRUE's of length 3, and can find their ends 
(of the runs of diffs) as cumsum(r$length)

   cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2)

Seems like there could be edge cases that I'm missing...

Martin

>
> Hi.
>
> If the pattern is not too long, try
>
>    m<- length(patrn)
>    n<- length(exmpl)
>    ind<- seq.int(length=n-m+1)
>    occur<- rep(TRUE, times=n-m+1)
>    for (i in seq.int(length=m)) {
>        occur<- occur&  (patrn[i] == exmpl[ind + i - 1])
>    }
>    which(occur)
>
>    [1]  6 13 23
>
> Hope this helps.
>
> Petr Savicky.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the R-help mailing list