[R] Need advice on using R with large datasets
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Apr 16 11:41:27 CEST 2004
On 13 Apr 2004, Peter Dalgaard wrote:
> "Roger D. Peng" <rpeng at jhsph.edu> writes:
>
> > As far as I know, R does compile on AMD Opterons and runs as a 64-bit
> > application. So it can store objects larger than 4GB. However, I
> > don't think R gets tested very often on 64-bit machines with such
> > large objects so there may be yet undiscovered bugs.
>
> There are a few such machines around among R users, and R seems to
> work OK on them. One slight gotcha is that the Fortran numeric
> libraries (Lapack, ATLAS) tend to use integer indexing, which might
> overflow for large objects. Things like data frames which consist of
> multiple subobjects might be less sensitive to this.
At present we restrict vectors to 2^31-1 and as from 1.9.0 have many
overflow checks in place. It's not just Fortran code, BTW: integer
indexing in C is pervasive in the R code, including in many add-on
packages.
So you can use large workspaces (and as someone said, this has been done
under Solaris and Compaq Alpha for at least a couple of years), and large
lists (including data frames), but the size of atomic vectors is limited
in a rather fundamental way.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list