[R] Practical Data Limitations with R

Sankalp Upadhyay sankalp.upadhyay at gmail.com
Tue Apr 8 18:52:32 CEST 2008


Millions of rows can be a problem if all is loaded into memory, 
depending on type of data. Numeric should be fine but if you have 
strings and you would want to process based on that column (string 
comparisons etc) then it would be slow.
You may want to combine sources outside - stored procedures maybe - and 
then load to R. Joining of data within R code can be costly if you are 
selecting from a data frame based on a string.
I have, personally, run into 'out of memory' problems only beyond 1G of 
data on a windows 32 bit system with 3 GB RAM. That happens with C++ also.
Regarding speed, I find MATLAB faster than R for matrix operations. In 
other areas they are in same range. R is much better to program as it is 
has a much more complete programming language.
R can use multiple cores / cpus with a suitable multi threaded linear 
algebra library. Though this will only be for linear algebra operations.
64 bit binary for R is not available for windows.

Sankalp


Jeff Royce wrote:
> We are new to R and evaluating if we can use it for a project we need to
> do.  We have read that R is not well suited to handle very large data
> sets.  Assuming we have the data prepped and stored in an RDBMS (Oracle,
> Teradata, SQL Server), what can R reasonably handle from a volume
> perspective?   Are there some guidelines on memory/machine sizing based
> on data volume?  We need to be able to handle Millions of Rows from
> several sources.  Any advice is much appreciated.  Thanks.  
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list