[R] Tip for performance improvement while handling huge data?

Suresh_FSFM suresh.ghalsasi at gmail.com
Sun Feb 8 21:09:47 CET 2009


Ok. Thank you.
As of now, vectorization option is feasible. Was not sure to handle this
way. would try.

Regards,
Suresh 


Philipp Pagel-5 wrote:
> 
>> For certain calculations, I have to handle a dataframe with say 10
>> million
>> rows and multiple columns of different datatypes. 
>> When I try to perform calculations on certain elements in each row, the
>> program just goes in "busy" mode for really long time.
>> To avoid this "busy" mode, I split the dataframe into subsets of 10000
>> rows.
>> Then the calculation was done very fast. within reasonable time.
>> 
>> Is there any other tip to improve the performance ?
> 
> Depending on what exactly it is you are doing and what causes the slowdown
> there may be a number of useful strategies:
> 
>  - Buy RAM (lots of it) - it's cheap
>  - Vectorize whatever you are doing
>  - Don't use all the data you have but draw a random sample of reasonalbe
> size
>  - ...
> 
> To be more helpful we'd have to know
> 
>  - what are the computations involved?
>  - how are they implemented at the moment?
>   -> example code
>  - what is the range of "really long time"?
> 
> cu
> 	Philipp
> 
> -- 
> Dr. Philipp Pagel
> Lehrstuhl für Genomorientierte Bioinformatik
> Technische Universität München
> Wissenschaftszentrum Weihenstephan
> 85350 Freising, Germany
> http://mips.gsf.de/staff/pagel
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Tip-for-performance-improvement-while-handling-huge-data--tp21901287p21902758.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list