[R] Need advice on using R with large datasets

Roger D. Peng rpeng at jhsph.edu
Tue Apr 13 17:54:25 CEST 2004


I've been running R on 64-bit SuSE Linux on Opterons for a few 
months now and it certainly runs fine in what I would call 
standard situations.  In particular there seems to be no problem 
with workspaces > 4GB.  But I seldom handle single objects (like 
matrices, vectors) that are > 4GB.  The only exception is lists, 
but I think those are okay since they are composed of various 
sub-objects (like Peter mentioned).

-roger

Liaw, Andy wrote:
> I was under the impression that R has been run on 64-bit Solaris (and other
> 64-bit Unices) for quite a while (as 64-bit app).  We've been running 64-bit
> R on amd64 for a few months (and had quite a few oppertunities to get the R
> processes using over 8GB of RAM).  Not much problem as far as I can see...
> 
> Best,
> Andy
> 
> 
>>From: Roger D. Peng
>>
>>As far as I know, R does compile on AMD Opterons and runs as a 
>>64-bit application.  So it can store objects larger than 4GB. 
>>However, I don't think R gets tested very often on 64-bit 
>>machines with such large objects so there may be yet undiscovered 
>>bugs.
>>
>>-roger
>>
>>Sunny Ho wrote:
>>
>>
>>>Hello everyone,
>>>
>>>I would like to get some advices on using R with some 
>>
>>really large datasets.
>>
>>>I'm using RH9 Linux R 1.8.1 for a research with a lot of 
>>
>>numerical data. The datasets total to around 200Mb (shown by 
>>memory.size). During my data manipulation, the system memory 
>>usage grew to 1.5Gb, and this caused a lot of swapping 
>>activities on my 1Gb PC. This is just a small-scale 
>>experiment, the full-scale one will be using data 30 times as 
>>large (on a 4Gb machine). I can see that I'll need to deal 
>>with memory usage problem very soon.
>>
>>>I notice that R keeps all datasets in memory at all times. 
>>
>>I wonder whether there is any way to instruct R to push some 
>>of the less-frequently-used data tables out of main memory, 
>>so as to free up memory for those that are actively in used. 
>>It'll be even better if R can keep only part of a table in 
>>memory only when that part is needed. Using save & load could 
>>help, but I just wonder whether R is intelligent enough to do 
>>this by itself, so I don't need to keep track of memory usage 
>>at all times.
>>
>>>Another thought is to use a 64-bit machine (AMD64). I find 
>>
>>there is a pre-compiled R for Fedora Linux on AMD64. Anyone 
>>knows whether this version of R runs as 64-bit? If so, then 
>>will R be able to go beyond the 32-bit 4Gb memory limit?
>>
>>>Also, from the manual, I find that the RPgSQL package (for 
>>
>>PostgreSQL database) supports a feature "proxy data frame". 
>>Does anyone have experience with this? Can "proxy data frame" 
>>handle memory efficiently for very large datasets? Say, if I 
>>have a 6Gb database table defined as a proxy data frame, will 
>>R & RPgSQL be able to handle it with just 4Gb of memory?
>>
>>>Any comments will be useful. Many thanks.
>>>
>>>Sunny Ho
>>>(Hong Kong University of Science & Technology)
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! 
>>
>>http://www.R-project.org/posting-guide.html
>>
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>
>>
> 
> 
> 
> ------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may be known outside the
> United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan as
> Banyu) that may be confidential, proprietary copyrighted and/or legally
> privileged. It is intended solely for the use of the individual or entity
> named on this message.  If you are not the intended recipient, and have
> received this message in error, please notify us immediately by reply e-mail
> and then delete it from your system.
> ------------------------------------------------------------------------------
>




More information about the R-help mailing list