[R] Practical Data Limitations with R

Philipp Pagel p.pagel at wzw.tum.de
Tue Apr 8 17:20:11 CEST 2008


On Tue, Apr 08, 2008 at 09:26:22AM -0500, Jeff Royce wrote:
> We are new to R and evaluating if we can use it for a project we need to
> do.  We have read that R is not well suited to handle very large data
> sets.  Assuming we have the data prepped and stored in an RDBMS (Oracle,
> Teradata, SQL Server), what can R reasonably handle from a volume
> perspective?   Are there some guidelines on memory/machine sizing based
> on data volume?  We need to be able to handle Millions of Rows from
> several sources.

As so often the answer is "it depends". R does not have an inherent
maximum number of rows it can deal with - the available memory
determines how big a dataset you can fit into RAM. So often the answer
would be "yes - just buy more RAM".

A couple million rows are no problem at all if you don't have too many
columns (done that). If you realy have a very large set of data which
you cannot fit into memory, you may still be able to use R: Do you
really need ALL data in memory at the same time? Often, very large
datasets actually contain many different subsets of data which you want
to analyze separately, anyway. The solution of storing the full data
in an RDBMS and selecting the required subsets as needed is the best
solution.

In your situation, I would simply load the full dataset into R and see
what happens.

cu
	Philipp

-- 
Dr. Philipp Pagel                              Tel.  +49-8161-71 2131
Lehrstuhl für Genomorientierte Bioinformatik   Fax.  +49-8161-71 2186
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
 
 and
 
Institut für Bioinformatik und Systembiologie / MIPS
Helmholtz Zentrum München -
Deutsches Forschungszentrum für Gesundheit und Umwelt
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel



More information about the R-help mailing list