[R] memory once again
murdoch at stats.uwo.ca
Fri Mar 3 21:31:14 CET 2006
On 3/3/2006 2:42 PM, Berton Gunter wrote:
> What you propose is not really a solution, as even if your data set didn't
> break the modified precision, another would. And of course, there is a price
> to be paid for reduced numerical precision.
> The real issue is that R's current design is incapable of dealing with data
> sets larger than what can fit in physical memory (expert
It can deal with big data sets, just not nearly as conveniently as it
deals with ones that fit in memory. The most straightforward way is
probably to put them in a database, and use RODBC or one of the
database-specific packages to read the data in blocks. (You could also
leave the data in a flat file and read it a block at a time from there,
but the database is probably worth the trouble: other people have done
the work involved in sorting, selecting, etc.)
The main problem you'll run into is that almost none of the R functions
know about databases, so you'll end up doing a lot of work to rewrite
the algorithms to work one block at a time, or on a random sample of
data, or whatever.
The original poster didn't say what he wanted to do with his data, but
if he only needs to work with a few variables at a time, he can easily
fit an 820,000 x N dataframe in memory, for small values of N. Reading
it in from a database would be easy.
> My understanding is that there is no way to change
> this without a fundamental redesign of R. This means that you must either
> live with R's limitations or use other software for "large" data sets.
> -- Bert Gunter
> Genentech Non-Clinical Statistics
> South San Francisco, CA
> "The business of the statistician is to catalyze the scientific learning
> process." - George E. P. Box
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dimitri Joe
>> Sent: Friday, March 03, 2006 11:28 AM
>> To: R-Help
>> Subject: [R] memory once again
>> Dear all,
>> A few weeks ago, I asked this list why small Stata files
>> became huge R
>> files. Thomas Lumley said it was because "Stata uses single-precision
>> floating point by default and can use 1-byte and 2-byte
>> integers. R uses
>> double precision floating point and four-byte integers." And
>> it seemed I
>> couldn't do anythig about it.
>> Is it true? I mean, isn't there a (more or less simple) way to change
>> how R stores data (maybe by changing the source code and
>> compiling it)?
>> The reason why I insist in this point is because I am trying to work
>> with a data frame with more than 820.000 observations and 80
>> The Stata file has 150Mb. With my Pentiun IV 2GHz and 1G RAM, Windows
>> XP, I could't do the import using the read.dta() function
>> from package
>> foreign. With Stat Transfer I managed to convert the Stata
>> file to a S
>> file of 350Mb, but my machine still didn't manage to import it using
>> I even tried to "increase" my memory by memory.limit(4000),
>> but it still
>> didn't work.
>> Regardless of the answer to my question, I'd appreciate to hear about
>> your experience/suggestions in working with big files in R.
>> Thank you for youR-Help,
>> Dimitri Szerman
>> R-help at stat.math.ethz.ch mailing list
>> PLEASE do read the posting guide!
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help