[Rd] 4-int indexing limit of R {Re: [R] allocMatrix limits}
Martin Maechler
maechler at stat.math.ethz.ch
Fri Aug 1 21:05:16 CEST 2008
>>>>> "VK" == Vadim Kutsyy <vadim at kutsyy.com>
>>>>> on Fri, 01 Aug 2008 10:22:43 -0700 writes:
VK> Martin Maechler wrote:
>> [[Topic diverted from R-help]]
>>
>> Well, fortunately, reasonable compilers have indeed kept
>> 'long' == 'long int' to mean 32-bit integers ((less
>> reasonable compiler writers have not, AFAIK: which leads
>> of course to code that no longer compiles correctly when
>> originally it did)) But of course you are right that
>> 64-bit integers (typically == 'long long', and really ==
>> 'int64') are very natural on 64-bit architectures. But
>> see below.
... I wrote complete rubbish,
and I am embarrassed ...
>>
VK> well in 64bit Ubunty, /usr/include/limits.h defines:
VK> /* Minimum and maximum values a `signed long int' can hold. */
VK> # if __WORDSIZE == 64
VK> # define LONG_MAX 9223372036854775807L
VK> # else
VK> # define LONG_MAX 2147483647L
VK> # endif
VK> # define LONG_MIN (-LONG_MAX - 1L)
VK> and using simple code to test
VK> (http://home.att.net/~jackklein/c/inttypes.html#int) my desktop, which
VK> is standard Intel computer, does show.
VK> Signed long min: -9223372036854775808 max: 9223372036854775807
yes. I am really embarrassed.
What I was trying to say was that
the definition of int / long /... should not change when going
from 32bit architecture to 64bit
and that the R internal structures consequently should also be
the same on 32-bit and 64-bit platforms
>> If you have too large a numeric matrix, it would be larger than
>> 2^31 * 8 bytes ~= 2^34 / 2^20 ~= 16'000 Megabytes.
>> If that is is 10% only for you, you'd have around 160 GB of
>> RAM. That's quite a impressive.
>>
>> cat /proc/meminfo | grep MemTotal
VK> MemTotal: 145169248 kB
VK> We have "smaller" SGI NUMAflex to play with, where the memory can
VK> increased to 512Gb ("larger" version doesn't have this "limitation").
VK> But with even commodity hardware you can easily get 128Gb for reasonable
VK> price (i.e. Dell PowerEdge R900)
>> Note that R objects are (pointers to) C structs that are
>> "well-defined" platform independently, and I'd say that this
>> should remain so.
>>
VK> I forgot that R stores two dimensional array in a single dimensional C
VK> array. Now I understand why there is a limitation on total number of
VK> elements. But this is a big limitations.
Yes, maybe
>> One of the last times this topic came up (within R-core),
>> we found that for all the matrix/vector operations,
>> we really would need versions of BLAS / LAPACK that would also
>> work with these "big" matrices, ie. such a BLAS/Lapack would
>> also have to internally use "longer int" for indexing.
>> At that point in time, we had decied we would at least wait to
>> hear about the development of such BLAS/LAPACK libraries
VK> BLAS supports two dimensional metrics definition, so if we would store
VK> matrix as two dimensional object, we would be fine. But than all R code
VK> as well as all packages would have to be modified.
exactly. And that was what I meant when I said "Compatibility".
But rather than changing the
"matrix = colmunwise stored as long vector" paradigm, should
rather change from 32-bit indexing to longer one.
The hope is that we eventually make up a scheme
which would basically allow to just recompile all packages :
In src/include/Rinternals.h,
we have had the following three lines for several years now:
------------------------------------------------------------------------------------
/* type for length of vectors etc */
typedef int R_len_t; /* will be long later, LONG64 or ssize_t on Win64 */
#define R_LEN_T_MAX INT_MAX
------------------------------------------------------------------------------------
and you are right, that it may be time to experiment a bit more
with replacing 'int' with long (and also the corresponding _MAX)
setting there,
and indeed, in the array.c code you cited, should repalce
INT_MAX by R_LEN_T_MAX
This still does not solve the problem that we'd have to get to
a BLAS / Lapack version that correctly works with "long indices"...
which may (or may not) be easier than I had thought.
Martin
More information about the R-devel
mailing list