[R] Maximum number of patterns and speed in grep

mdvaan mathijsdevaan at gmail.com
Fri Jul 6 16:45:48 CEST 2012


Hi,

I am using R's grep function to find patterns in vectors of strings. The
number of patterns I would like to match is 7,700 (of different sizes). I
noticed that I get an error message when I do the following: 

data <- array()
for (j in 1:length(x))
{
array[j] <- length(grep(paste(patterns[1:7700], collapse = "|"),  x[j],
value = T))
}

When I break this up into 4 chunks of patterns it works:

data <- array()
for (j in 1:length(x))
{
array$chunk1[j] <- length(grep(paste(patterns[1:2500], collapse = "|"), 
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[2501:5000], collapse = "|"), 
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[5001:7500], collapse = "|"), 
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[7501:7700], collapse = "|"), 
x[j], value = T))
} 

My questions: what's the maximum size of the patterns argument in grep? Is
there a way to do this faster? It is very slow.

Thanks.

Math

Sorry for not providing a reproducible example. It's a size issue which
makes it difficult to provide an example.

 

--
View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list