[R] FW: Large datasets in R
Roger D. Peng
rdpeng at gmail.com
Tue Jul 18 15:40:27 CEST 2006
In my experience, the OS's use of virtual memory is only relevant in the rough
sense that the OS can store *other* running applications in virtual memory so
that R can use as much of the physical memory as possible. Once R itself
overflows into virtual memory it quickly becomes unusable.
I'm not sure I understand your second question. As R is available in source
code form, it can be compiled for many 64-bit operating systems.
-roger
Marshall Feldman wrote:
> Hi,
>
> I have two further comments/questions about large datasets in R.
>
> 1. Does R's ability to handle large datasets depend on the operating
> system's use of virtual memory? In theory, at least, VM should make the
> difference between installed RAM and virtual memory on a hard drive
> primarily a determinant of how fast R will calculate rather than whether or
> not it can do the calculations. However, if R has some low-level routines
> that have to be memory resident and use more memory as the amount of data
> increases, this may not hold. Can someone shed light on this?
>
> 2. Is What 64-bit versions of R are available at present?
>
> Marsh Feldman
> The University of Rhode Island
>
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> Sent: Monday, July 17, 2006 3:21 PM
> To: Deepankar Basu
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Large datasets in R
>
> On Mon, 17 Jul 2006, Deepankar Basu wrote:
>
>> Hi!
>>
>> I am a student of economics and currently do most of my statistical work
>> using STATA. For various reasons (not least of which is an aversion for
>> proprietary software), I am thinking of shifting to R. At the current
>> juncture my concern is the following: would I be able to work on
>> relatively large data-sets using R? For instance, I am currently working
>> on a data-set which is about 350MB in size. Would be possible to work
>> data-sets of such sizes using R?
>
>
> The answer depends on a lot of things, but most importantly
> 1) What you are going to do with the data
> 2) Whether you have a 32-bit or 64-bit version of R
> 3) How much memory your computer has.
>
> In a 32-bit version of R (where R will not be allowed to address more than
> 2-3Gb of memory) an object of size 350Mb is large enough to cause problems
> (see eg the R Installation and Adminstration Guide).
>
> If your 350Mb data set has lots of variables and you only use a few at a
> time then you may not have any trouble even on a 32-bit system once you
> have read in the data.
>
> If you have a 64-bit version of R and a few Gb of memory then there should
> be no real difficulty in working with that size of data set for most
> analyses. You might come across some analyses (eg some cluster analysis
> functions) that use n^2 memory for n observations and so break down.
>
>
> -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
More information about the R-help
mailing list