[R] Tip for performance improvement while handling huge data?
    Philipp Pagel 
    p.pagel at wzw.tum.de
       
    Sun Feb  8 20:28:53 CET 2009
    
    
  
> For certain calculations, I have to handle a dataframe with say 10 million
> rows and multiple columns of different datatypes. 
> When I try to perform calculations on certain elements in each row, the
> program just goes in "busy" mode for really long time.
> To avoid this "busy" mode, I split the dataframe into subsets of 10000 rows.
> Then the calculation was done very fast. within reasonable time.
> 
> Is there any other tip to improve the performance ?
Depending on what exactly it is you are doing and what causes the slowdown
there may be a number of useful strategies:
 - Buy RAM (lots of it) - it's cheap
 - Vectorize whatever you are doing
 - Don't use all the data you have but draw a random sample of reasonalbe size
 - ...
To be more helpful we'd have to know
 - what are the computations involved?
 - how are they implemented at the moment?
  -> example code
 - what is the range of "really long time"?
cu
	Philipp
-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel
    
    
More information about the R-help
mailing list