# [R] matching a sequence in a vector?

Petr Savicky savicky at cs.cas.cz
Thu Feb 16 09:01:49 CET 2012

```On Wed, Feb 15, 2012 at 08:12:32PM -0500, Gabor Grothendieck wrote:
> On Tue, Feb 14, 2012 at 11:17 PM, Redding, Matthew
> <Matthew.Redding at deedi.qld.gov.au> wrote:
> > I've been trawling through the documentation and listserv archives on this topic -- but
> > as yet have not found a solution.  I'm sure this is pretty simple with R, but I cannot work out how without
> > resorting to ugly nested loops.
> >
> > As far as I can tell, grep, match, and %in% are not the correct tools.
> >
> > Question:
> > given these vectors --
> > patrn <- c(1,2,3,4)
> > exmpl <- c(3,3,4,2,3,1,2,3,4,8,8,23,1,2,3,4,4,34,4,3,2,1,1,2,3,4)
> >
> > how do I get the desired answer by finding the occurence of the pattern and returning the starting indices:
> > 6, 13, 23
> >
>
> Here is a one-liner:
>
> library(zoo)
> which(rollapply(exmpl, 4, identical, patrn, fill = FALSE, align = "left"))

Hi.

There were several solutions in this thread. Their speed differs
quite significantly. Here is a comparison.

patrn <- 1:4
exmpl <- sample(1:4, 10000, replace=TRUE)

occur1 <- function(patrn, exmpl)
{
m <- length(patrn)
n <- length(exmpl)
candidate <- seq.int(length=n-m+1)
for (i in seq.int(length=m)) {
candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
}
candidate
}

occur2 <- function(patrn, exmpl)
{
patrn.rev <- rev(patrn)
w <- embed(exmpl,length(patrn))
which(apply(w,1,function(r) all(r == patrn.rev)))
}

occur3 <- function(patrn, exmpl)
{
patrn.rev <- rev(patrn)
w <- embed(exmpl,length(patrn))
which(rowSums(w == rep(rev(patrn), each=nrow(w))) == ncol(w))
}

occur4 <- function(patrn, exmpl)
{
# requires patrn without duplicates
n = length(patrn)
r = rle(diff(match(exmpl, patrn)) == 1L)
cumsum(r\$length)[r\$values & r\$length == (n - 1L)] - (n - 2L)
}

occur5 <- function(patrn, exmpl)
{
which( sapply( 1:(length(exmpl)-length(patrn)+1), function(i) isTRUE( all.equal( patrn, exmpl[i + 0:(length(patrn)-1) ] ) ) ) )
}

occur6 <- function(patrn, exmpl)
{
indx <- embed(rev(seq_along(exmpl)), length(patrn))
matches <- apply(indx, 1, function(.indx){
all(exmpl[.indx] == patrn)
})
rev(indx[matches, 1L])
}

occur7 <- function(patrn, exmpl)
{
which(rollapply(exmpl, length(patrn), identical, patrn, fill = FALSE, align = "left"))
}

library(zoo)

t1 <- system.time( out1 <- occur1(patrn, exmpl) )
t2 <- system.time( out2 <- occur2(patrn, exmpl) )
t3 <- system.time( out3 <- occur3(patrn, exmpl) )
t4 <- system.time( out4 <- occur4(patrn, exmpl) )
t5 <- system.time( out5 <- occur5(patrn, exmpl) )
t6 <- system.time( out6 <- occur6(patrn, exmpl) )
t7 <- system.time( out7 <- occur7(patrn, exmpl) )

print(identical(out1, out2))
print(identical(out1, out3))
print(identical(out1, out4))
print(identical(out1, out5))
print(identical(out1, out6))
print(identical(out1, out7))
print(rbind(t1, t2, t3, t4, t5, t6, t7))

The output was

[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
user.self sys.self elapsed user.child sys.child
t1     0.001        0   0.001          0         0
t2     0.062        0   0.061          0         0
t3     0.002        0   0.002          0         0
t4     0.001        0   0.001          0         0
t5     1.749        0   1.749          0         0
t6     0.068        0   0.068          0         0
t7     0.172        0   0.172          0         0

Petr Savicky.

```