[R] matching a sequence in a vector?

Petr Savicky savicky at cs.cas.cz
Wed Feb 15 16:26:04 CET 2012


On Wed, Feb 15, 2012 at 06:27:01AM -0800, Martin Morgan wrote:
> On 02/14/2012 11:45 PM, Petr Savicky wrote:
> >On Wed, Feb 15, 2012 at 02:17:35PM +1000, Redding, Matthew wrote:
> >>Hi All,
> >>
> >>
> >>I've been trawling through the documentation and listserv archives on 
> >>this topic -- but
> >>as yet have not found a solution.  I'm sure this is pretty simple with R, 
> >>but I cannot work out how without
> >>resorting to ugly nested loops.
> >>
> >>As far as I can tell, grep, match, and %in% are not the correct tools.
> >>
> >>Question:
> >>given these vectors --
> >>patrn<- c(1,2,3,4)
> >>exmpl<- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
> >>
> >>how do I get the desired answer by finding the occurence of the pattern 
> >>and returning the starting indices:
> >>6, 13, 23
> 
> match(exmpl, patrn) returns indexes that differ by 1 if the sequence 
> patrn occurs
> 
>   n = length(patrn)
>   r = rle(diff(match(exmpl, patrn)) == 1)
> 
> we're looking for a run of TRUE's of length 3, and can find their ends 
> (of the runs of diffs) as cumsum(r$length)
> 
>   cumsum(r$length)[r$values & r$length == (n - 1)] - (n - 2)
> 
> Seems like there could be edge cases that I'm missing...

Hi Martin:

This is a nice solution. In my opinion, it works, whenever "patrn"
does not contain duplicates.

Petr.



More information about the R-help mailing list