[R] speed issue: gsub on large data frame
Simon Pickert
simon.pickert at t-online.de
Tue Nov 5 09:13:12 CET 2013
How’s that not reproducible?
1. Data frame, one column with text strings
2. Size of data frame= 4million observations
3. A bunch of gsubs in a row ( gsub(patternvector, “[token]“,dataframe$text_column) )
4. General question: How to speed up string operations on ‘large' data sets?
Please let me know what more information you need in order to reproduce this example?
It’s more a general type of question, while I think the description above gives you a specific picture of what I’m doing right now.
General question:
Am 05.11.2013 um 06:59 schrieb Jeff Newmiller <jdnewmil at dcn.davis.CA.us>:
> Example not reproducible. Communication fail. Please refer to Posting Guide.
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Simon Pickert <simon.pickert at t-online.de> wrote:
>> Hi R’lers,
>>
>> I’m running into speeding issues, performing a bunch of
>>
>> „gsub(patternvector, [token],dataframe$text_column)"
>>
>> on a data frame containing >4millionentries.
>>
>> (The “patternvectors“ contain up to 500 elements)
>>
>> Is there any better/faster way than performing like 20 gsub commands in
>> a row?
>>
>>
>> Thanks!
>> Simon
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list