[R] amount of data R can handle in a single file

Claudia Beleites cbeleites at units.it
Thu Feb 17 10:59:48 CET 2011


On 02/17/2011 10:16 AM, Nasila, Mark wrote:
> Dear Sir/Madam,
>
>
>
>    I would like to know what is the maximum number of observations a
> single file must have when using R. I am asking this because am trying
Dear Mark,

> to do research on banking transactions and i have around 49million
> records. Can R handle this? Advise with regard to this.
I think R can address up to a length of 2^32 ≈ 4.3e9 elements.
2^32 elements (numeric) = 32 GB per vector (matrix, array).

For me, the available RAM is the more important limit:
I work without problem with (numeric) matrices of size 2e5 x 250 = 5e7 elements 
(380 MB) that were produced from 5e4 x 2500 = 1.25e8 elements (≈ 1GB) raw data. 
The raw data is the practical limit on my 8 GB (64 bit linux) machine:
During the processing it becomes complex, thus ≈ 2 GB, and with that I had to be 
very careful not to copy the matrix too often. This and a bunch of gc() calls 
let me process the data without swapping. :-)
Note that 2 GB corresponds quite nicely to the rule of thumb that the end of fun 
is reached with variable sizes of 1/3 of the RAM.

If you are concerned about your data set, I'd recommend reading a fraction of 
the data set and have a look at the object.size() and also on how the RAM use is 
during data analysis of that partial data set. Then extrapolate to the complete 
data set.

HTH Claudia



>
>
>
>
>
>
>
>
>
>
> Mark Nasila
> Quantitative Analyst
> CBS Risk Management
>
> Personal Banking
> 7th Floor, 2 First Place,
> Cnr Jeppe and Simmonds Street,
> Johannesburg,
> 2000
> Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118
> e-mail MNasila at fnb.co.za<mailto:MNasila at fnb.co.za>
>
> www.fnb.co.za<http://www.fnb.co.za/>   www.howcanwehelpyou.co.za
> <http://www.howcanwehelpyou.co.za/>
>
> First National Bank - a division of FirstRand Bank Limited.
> An Authorised Financial Services and Credit Provider (NCRCP20).
>
> 'Consider the effect on the environment before printing this email.'
>
>
>
>
> To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser:
> https://www.fnb.co.za/disclaimer.html
>
> If you are unable to access the Disclaimer, send a blank e-mail to
> firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbeleites at units.it



More information about the R-help mailing list