[R] gregexpr slow and increases exponentially with string length --> how to speed it up?
Emmanuel Levy
emmanuel.levy at gmail.com
Fri Oct 31 03:16:48 CET 2008
Hi Chuck,
Thanks a lot for your suggestion.
> You can find all such matches (not just the disjoint ones that gregexpr
> finds) using something like this:
>
> twomatch <-function(x,y) intersect(x+1,y)
> match.list <-
> list(
> which( vec %in% c(3,6,7) ),
> which( vec == 2 ),
> which( vec %in% 1:9 ),
> which( vec %in% c(1,2,9) ) )
> res <- Reduce( twomatch, match.list ) - length(match.list) + 1
>
I should have made explicit that I have many of these "motifs" to
match, and their structure vary quite a bit. This means that I'd need
a function to translate each motif into the solution you proposed,
which would be (although feasible), a bit painful.
In the meantime, the best solution I found is to cut the big string
into smaller strings. That actually speeds things up a lot.
Best,
E
> If you want to precisely match the gregexpr results, you'll need to filter
> out the overlapping matches.
>
> HTH,
>
> Chuck
>
>>
>> Best,
>>
>> Emmanuel
>>
>>
>>> for (i in c(10000, 50000, 100000, 500000)){
>>
>> + aa = as.character(sample(1:9, i, replace=T))
>> + aa = paste(aa, collapse='')
>> + print(i)
>> + print(system.time(gregexpr("[367]2[1-9][129]",aa)))
>> + }
>> [1] 10000
>> user system elapsed
>> 0.004 0.000 0.003
>> [1] 50000
>> user system elapsed
>> 0.060 0.000 0.061
>> [1] 1e+05
>> user system elapsed
>> 0.240 0.000 0.238
>> [1] 5e+05
>> user system elapsed
>> 5.733 0.000 5.732
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
>
>
>
More information about the R-help
mailing list