[Rd] 4-int indexing limit of R {Re: [R] allocMatrix limits}

Martin Maechler maechler at stat.math.ethz.ch
Fri Aug 1 21:05:16 CEST 2008


>>>>> "VK" == Vadim Kutsyy <vadim at kutsyy.com>
>>>>>     on Fri, 01 Aug 2008 10:22:43 -0700 writes:

    VK> Martin Maechler wrote:
    >> [[Topic diverted from R-help]]
    >> 
    >> Well, fortunately, reasonable compilers have indeed kept
    >> 'long' == 'long int' to mean 32-bit integers ((less
    >> reasonable compiler writers have not, AFAIK: which leads
    >> of course to code that no longer compiles correctly when
    >> originally it did)) But of course you are right that
    >> 64-bit integers (typically == 'long long', and really ==
    >> 'int64') are very natural on 64-bit architectures.  But
    >> see below.

... I wrote complete rubbish, 
and I am embarrassed ...

    >> 
    VK> well in 64bit Ubunty, /usr/include/limits.h defines:

    VK> /* Minimum and maximum values a `signed long int' can hold.  */
    VK> #  if __WORDSIZE == 64
    VK> #   define LONG_MAX     9223372036854775807L
    VK> #  else
    VK> #   define LONG_MAX     2147483647L
    VK> #  endif
    VK> #  define LONG_MIN      (-LONG_MAX - 1L)

    VK> and using simple code to test 
    VK> (http://home.att.net/~jackklein/c/inttypes.html#int) my desktop, which 
    VK> is standard Intel computer, does show.

    VK> Signed long min: -9223372036854775808 max: 9223372036854775807

yes.  I am really embarrassed.

What I was trying to say was that
the definition of  int / long /...  should not change when going
from 32bit architecture to  64bit 
and that the R internal structures consequently should also be
the same on 32-bit and 64-bit platforms

    >> If you have too large a numeric matrix, it would be larger than
    >> 2^31 * 8 bytes ~=  2^34 / 2^20 ~= 16'000 Megabytes.
    >> If that is is 10% only for you,  you'd have around 160 GB of
    >> RAM.  That's quite a impressive.
    >> 
    >> cat /proc/meminfo | grep MemTotal
    VK> MemTotal:     145169248 kB

    VK> We have "smaller" SGI NUMAflex to play with, where the memory can 
    VK> increased to 512Gb ("larger" version doesn't have this "limitation").  
    VK> But with even commodity hardware you can easily get 128Gb for reasonable 
    VK> price (i.e. Dell PowerEdge R900)

    >> Note that R objects are (pointers to) C structs that are
    >> "well-defined" platform independently, and I'd say that this
    >> should remain so.

    >> 
    VK> I forgot that R stores two dimensional array in a single dimensional  C 
    VK> array. Now I understand why there is a limitation on total number of 
    VK> elements.  But this is a big limitations.

Yes, maybe

    >> One of the last times this topic came up (within R-core),
    >> we found that for all the matrix/vector operations,
    >> we really would need versions of  BLAS / LAPACK that would also
    >> work with these "big" matrices, ie. such a BLAS/Lapack would
    >> also have to internally use "longer int" for indexing.
    >> At that point in time, we had decied we would at least wait to
    >> hear about the development of such BLAS/LAPACK libraries

    VK> BLAS supports two dimensional metrics definition, so if we would store 
    VK> matrix as two dimensional object, we would be fine.  But than all R code 
    VK> as well as all packages would have to be modified.

exactly.  And that was what I meant when I said "Compatibility".

But rather than changing the  
 "matrix = colmunwise stored as long vector" paradigm, should
rather change from 32-bit indexing to longer one.

The hope is that we eventually make up a scheme
which would basically allow to just recompile all packages :

In src/include/Rinternals.h,
we have had the following three lines for several years now:
------------------------------------------------------------------------------------
/* type for length of vectors etc */
typedef int R_len_t; /* will be long later, LONG64 or ssize_t on Win64 */
#define R_LEN_T_MAX INT_MAX
------------------------------------------------------------------------------------

and you are right, that it may be time to experiment a bit more
with replacing 'int' with long (and also the corresponding _MAX)
setting there,
and indeed, in the array.c  code you cited, should repalce
INT_MAX  by  R_LEN_T_MAX

This still does not solve the problem that we'd have to get to
a BLAS / Lapack version that correctly works with "long indices"...
which may (or may not) be easier than I had thought.

Martin



More information about the R-devel mailing list