[R] Maximum number of patterns and speed in grep

Sarah Goslee sarah.goslee at gmail.com
Fri Jul 6 17:39:25 CEST 2012


Hi,

Given that you can't provide a full example, please at least provide
str() on your data, more complete information on the problem, and
ideally a small toy example that demonstrates precisely what you are
doing.

For instance, you tell us that you "get an error message" but you
never tell us what it is. Don't you think we might need to know what
the error is to be able to diagnose and fix it?

Also, note that your "working" example simply overwrites
array$chunk1[j] four times.

Sarah

On Fri, Jul 6, 2012 at 10:45 AM, mdvaan <mathijsdevaan at gmail.com> wrote:
> Hi,
>
> I am using R's grep function to find patterns in vectors of strings. The
> number of patterns I would like to match is 7,700 (of different sizes). I
> noticed that I get an error message when I do the following:
>
> data <- array()
> for (j in 1:length(x))
> {
> array[j] <- length(grep(paste(patterns[1:7700], collapse = "|"),  x[j],
> value = T))
> }
>
> When I break this up into 4 chunks of patterns it works:
>
> data <- array()
> for (j in 1:length(x))
> {
> array$chunk1[j] <- length(grep(paste(patterns[1:2500], collapse = "|"),
> x[j], value = T))
> array$chunk1[j] <- length(grep(paste(patterns[2501:5000], collapse = "|"),
> x[j], value = T))
> array$chunk1[j] <- length(grep(paste(patterns[5001:7500], collapse = "|"),
> x[j], value = T))
> array$chunk1[j] <- length(grep(paste(patterns[7501:7700], collapse = "|"),
> x[j], value = T))
> }
>
> My questions: what's the maximum size of the patterns argument in grep? Is
> there a way to do this faster? It is very slow.
>
> Thanks.
>
> Math
>
> Sorry for not providing a reproducible example. It's a size issue which
> makes it difficult to provide an example.
>


-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list