[R] Searching for a pattern within a vector
Petr Savicky
savicky at cs.cas.cz
Fri Feb 24 09:19:43 CET 2012
On Fri, Feb 24, 2012 at 01:00:00PM +0530, Apoorva Gupta wrote:
> Dear R users,
>
> I have a data.frame as follows
>
> a b c d e
> [1,] 1 1 1 0 0
> [2,] 1 1 0 0 0
> [3,] 1 1 0 0 0
> [4,] 0 1 1 1 1
> [5,] 0 1 1 1 1
> [6,] 1 1 1 1 1
> [7,] 1 1 1 0 1
> [8,] 1 1 1 0 1
> [9,] 1 1 1 0 0
> [10,] 1 1 1 0 0
>
> Within these 4 vectors, I want to choose those vectors for which I
> have the pattern (0,0,1,1,1,1) occuring anywhere in the vector.
> This means I want vectors a,c,e and not b and d.
Hi.
A related thread was
[R] matching a sequence in a vector?
which started at
https://stat.ethz.ch/pipermail/r-help/2012-February/303608.html
https://stat.ethz.ch/pipermail/r-help/attachments/20120215/989a2e88/attachment.pl
and a summary of suggested solutions was at
https://stat.ethz.ch/pipermail/r-help/2012-February/303756.html
Try the following, where any of the functions occur* described there
may be used instead of occur1. The original function returned the
vector "candidate" of the indices, where an occurence of "patrn"
in "exmpl" starts. For your purposes, the function has to be modified
in two directions.
1. The output is the condition length(candidate) != 0 instead of "candidate".
2. The argument "exmpl" is the first argument.
# your data frame
df <- structure(list(a = c(1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L),
b = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), c = c(1L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), d = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L),
e = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L)), .Names = c("a", "b", "c",
"d", "e"), class = "data.frame", row.names = c(NA, -10L))
# modified function occur1
testoccur1 <- function(exmpl, patrn)
{
m <- length(patrn)
n <- length(exmpl)
candidate <- seq.int(length=n-m+1)
for (i in seq.int(length=m)) {
candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
}
length(candidate) != 0
}
selection <- unlist(lapply(df, testoccur1, patrn=c(0,0,1,1,1,1)))
selection
a b c d e
TRUE FALSE TRUE FALSE TRUE
df[, selection]
a c e
1 1 1 0
2 1 0 0
3 1 0 0
4 0 1 1
5 0 1 1
6 1 1 1
7 1 1 1
8 1 1 1
9 1 1 0
10 1 1 0
In your post, you printed not a data frame, but a matrix. If your
structure is a matrix, try the following
# your matrix
mat <- as.matrix(df)
mat
a b c d e
[1,] 1 1 1 0 0
[2,] 1 1 0 0 0
[3,] 1 1 0 0 0
[4,] 0 1 1 1 1
[5,] 0 1 1 1 1
[6,] 1 1 1 1 1
[7,] 1 1 1 0 1
[8,] 1 1 1 0 1
[9,] 1 1 1 0 0
[10,] 1 1 1 0 0
# selection of columns
sel <- apply(mat, 2, testoccur1, patrn=c(0,0,1,1,1,1))
mat[, sel]
a c e
[1,] 1 1 0
[2,] 1 0 0
[3,] 1 0 0
[4,] 0 1 1
[5,] 0 1 1
[6,] 1 1 1
[7,] 1 1 1
[8,] 1 1 1
[9,] 1 1 0
[10,] 1 1 0
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list