[R] matching a sequence in a vector?

Petr Savicky savicky at cs.cas.cz
Wed Feb 15 11:21:45 CET 2012


On Wed, Feb 15, 2012 at 10:26:44AM +0100, Berend Hasselman wrote:
> 
> On 15-02-2012, at 05:17, Redding, Matthew wrote:
> 
> > Hi All,
> > 
> > 
> > I've been trawling through the documentation and listserv archives on this topic -- but
> > as yet have not found a solution.  I'm sure this is pretty simple with R, but I cannot work out how without
> > resorting to ugly nested loops.
> > 
> > As far as I can tell, grep, match, and %in% are not the correct tools.
> > 
> > Question:
> > given these vectors --
> > patrn <- c(1,2,3,4)
> > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
> > 
> > how do I get the desired answer by finding the occurence of the pattern and returning the starting indices:
> > 6, 13, 23
> > 
> 
> patrn.rev <- rev(patrn)
> w <- embed(exmpl,length(patrn))
> w.pos <- apply(w,1,function(r) all(r == patrn.rev))
> which(w.pos)

Hi.

If the speed is an issue and exmpl is long, the
following modification may be faster.

  patrn.rev <- rev(patrn)
  w <- embed(exmpl,length(patrn))
  which(rowSums(w == rep(patrn.rev, each=nrow(w))) == ncol(w))

  [1]  6 13 23

For length(patrn) = 11 and length(exmpl) = 10000, i obtained
a speed up by a factor of 10.

Hope this helps.

How large are the vectors "patrn" and "exmpl" in your application?

Petr Savicky.



More information about the R-help mailing list