[R] FW: Large datasets in R

Roger D. Peng rdpeng at gmail.com
Tue Jul 18 15:40:27 CEST 2006

In my experience, the OS's use of virtual memory is only relevant in the rough 
sense that the OS can store *other* running applications in virtual memory so 
that R can use as much of the physical memory as possible.  Once R itself 
overflows into virtual memory it quickly becomes unusable.

I'm not sure I understand your second question.  As R is available in source 
code form, it can be compiled for many 64-bit operating systems.


Marshall Feldman wrote:
> Hi,
> I have two further comments/questions about large datasets in R.
> 1. Does R's ability to handle large datasets depend on the operating
> system's use of virtual memory? In theory, at least, VM should make the
> difference between installed RAM and virtual memory on a hard drive
> primarily a determinant of how fast R will calculate rather than whether or
> not it can do the calculations. However, if R has some low-level routines
> that have to be memory resident and use more memory as the amount of data
> increases, this may not hold. Can someone shed light on this?
> 2. Is What 64-bit versions of R are available at present?
> 	Marsh Feldman
> 	The University of Rhode Island
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu] 
> Sent: Monday, July 17, 2006 3:21 PM
> To: Deepankar Basu
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Large datasets in R
> On Mon, 17 Jul 2006, Deepankar Basu wrote:
>> Hi!
>> I am a student of economics and currently do most of my statistical work
>> using STATA. For various reasons (not least of which is an aversion for
>> proprietary software), I am thinking of shifting to R. At the current
>> juncture my concern is the following: would I be able to work on
>> relatively large data-sets using R? For instance, I am currently working
>> on a data-set which is about 350MB in size. Would be possible to work
>> data-sets of such sizes using R?
> The answer depends on a lot of things, but most importantly
> 1) What you are going to do with the data
> 2) Whether you have a 32-bit or 64-bit version of R
> 3) How much memory your computer has.
> In a 32-bit version of R (where R will not be allowed to address more than 
> 2-3Gb of memory) an object of size 350Mb is large enough to cause problems 
> (see eg the R Installation and Adminstration Guide).
> If your 350Mb data set has lots of variables and you only use a few at a 
> time then you may not have any trouble even on a 32-bit system once you 
> have read in the data.
> If you have a 64-bit version of R and a few Gb of memory then there should 
> be no real difficulty in working with that size of data set for most 
> analyses.  You might come across some analyses (eg some cluster analysis 
> functions) that use n^2 memory for n observations and so break down.
>  	-thomas
> Thomas Lumley			Assoc. Professor, Biostatistics
> tlumley at u.washington.edu	University of Washington, Seattle
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

More information about the R-help mailing list