[R] what is the faster way to search for a pattern in a few million entries data frame ?

Jim Lemon drjimlemon at gmail.com
Mon Apr 11 00:51:51 CEST 2016


Hi Fabien,
I was going to send this last night, but I thought it was too simple.
Runs in about one millisecond.

df<-data.frame(freq=runif(1000),
 strings=apply(matrix(sample(LETTERS,10000,TRUE),ncol=10),
 1,paste,collapse=""))
match.ind<-grep("DF",df$strings)
match.ind
 [1]   2  11  91 133 169 444 547 605 734 943

Jim


On Mon, Apr 11, 2016 at 5:27 AM, Fabien Tarrade
<fabien.tarrade at gmail.com> wrote:
> Hi Duncan,
>>
>> Didn't you post the same question yesterday?  Perhaps nobody answered
>> because your question is unanswerable.
>
> sorry, I got a email that my message was waiting for approval and when I
> look at the forum I didn't see my message and this is why  I sent it again
> and this time I did check that the format of my message was text only. Sorry
> for the noise.
>>
>> You need to describe what the strings are like and what the patterns are
>> like if you want advice on speeding things up.
>
> my strings are 1-gram up to 5-grams (sequence of 1 work up to 5 words) and I
> am searching for the frequency in my DF of the strings starting with a
> sequence of few words.
>
> I guess these days it is standard to use DF with millions of entries so I
> was wondering how people are doing that in the faster way.
>
> Thanks
> Cheers
> Fabien
>
> --
> Dr Fabien Tarrade
>
> Quantitative Analyst/Developer - Data Scientist
>
> Senior data analyst specialised in the modelling, processing and statistical
> treatment of data.
> PhD in Physics, 10 years of experience as researcher at the forefront of
> international scientific research.
> Fascinated by finance and data modelling.
>
> Geneva, Switzerland
>
> Email : <mailto:contact at fabien-tarrade.eu>contact at fabien-tarrade.eu
> Phone : <http://www.fabien-tarrade.eu>www.fabien-tarrade.eu
> Phone : +33 (0)6 14 78 70 90
>
> LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter
> <https://twitter.com/fabtar> Google
> <https://plus.google.com/+FabienTarradeProfile/posts> Facebook
> <https://www.facebook.com/fabien.tarrade.eu> Google <skype:fabtarhiggs?call>
> Xing <https://www.xing.com/profile/Fabien_Tarrade>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list