[R] Performance & capacity characteristics of R?

Prof Brian D Ripley ripley at stats.ox.ac.uk
Tue Aug 3 10:14:56 CEST 1999


On Tue, 3 Aug 1999, Karsten M. Self wrote:

> I hope this is merely a FAQ, and not an AFAQ (annoyingly....).
> 
> I'm a SAS programmer, with several years' experience of the system,
> evaluating alternatives.  See the SAS for Linux website (URL in sig) for
> more info.
> 
> I'm exploring R's capabilities and limitations.  I'd be very interested
> in having a deeper understanding of it capacity and performance
> limitations in dealing with very large datasets, which I would classify
> as tables with 1 million to 100s of millions of rows and two - 100+
> fields (variables) generally of 8 bytes -- call it a 16 - 800 byte
> record length.

Can you tell us what statistical procedures need 1 million to 100s of
millions or rows (observations)?  Some of us have doubted that there are
even datasets of 100,000 examples that are homogeneous and for which a
small subsample would not give all the statistical information. (If they
are not homogeneous, one could/should analyse homogeneous subsets and do a
meta-analysis.)

Your datasets appear to be (taking a mid-range value) around 1Gbyte
in size.

> Can R handle such large datasets (tables)?  What are the general

R has a workspace size limit of 2048Mb, and on 32-bit machines this cannot
be raised more than a tiny amount. I have only run R on a machine with
512Mb of RAM, and on that using objects of more than 100Mb or so slowed it
down very considerably.

> parameters for memory requirements?  How great a performance hit does
> running to swap (virtual memory) entail?  What common

A large hit, as R's garbage collector moves objects in memory.

> procedures|functions under R use significantly more memory?  Are there
> guidelines or documentation which point to issues and parameters of
> large file|dataset processing under R?

At its present stage of development, R is not tuned to work with such large
datasets. There are plans to make it work better with them, but the issue
remains as to whether there are many real applications that need such
datasets. Hence my first question.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list