[R] Need advice on using R with large datasets

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Apr 16 11:41:27 CEST 2004


On 13 Apr 2004, Peter Dalgaard wrote:

> "Roger D. Peng" <rpeng at jhsph.edu> writes:
> 
> > As far as I know, R does compile on AMD Opterons and runs as a 64-bit
> > application.  So it can store objects larger than 4GB. However, I
> > don't think R gets tested very often on 64-bit machines with such
> > large objects so there may be yet undiscovered bugs.
> 
> There are a few such machines around among R users, and R seems to
> work OK on them. One slight gotcha is that the Fortran numeric
> libraries (Lapack, ATLAS) tend to use integer indexing, which might
> overflow for large objects. Things like data frames which consist of
> multiple subobjects might be less sensitive to this. 

At present we restrict vectors to 2^31-1 and as from 1.9.0 have many
overflow checks in place.  It's not just Fortran code, BTW: integer
indexing in C is pervasive in the R code, including in many add-on 
packages.

So you can use large workspaces (and as someone said, this has been done 
under Solaris and Compaq Alpha for at least a couple of years), and large 
lists (including data frames), but the size of atomic vectors is limited 
in a rather fundamental way.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list