[R] 64-bit R and cache memory
Martin Maechler
maechler at stat.math.ethz.ch
Sat May 17 18:28:13 CEST 2008
>>>>> "GA" == Gad Abraham <gabraham at csse.unimelb.edu.au>
>>>>> on Sat, 17 May 2008 21:12:41 +1000 writes:
GA> Joram Posma wrote:
>> Dear all,
>>
>> I have a few questions regarding the 64 bit version of R and the cache
>> memory R uses.
>>
>> -----------------------------------
>> Computer & software info:
>>
>> OS: kUbuntu Feasty Fawn 7.04 (64-bit)
>> Processor: AMD Opteron 64-bit
>> R: version 2.7.0 (64-bit)
>> Cache memory: currently 16 GB (was 2 GB)
>> Outcome of 'limit' command in shell: cputime unlimited, filesize
>> unlimited, datasize unlimited, stacksize 8192 kbytes, coredumpsize 0
>> kbytes, memoryuse unlimited, vmemoryuse unlimited, descriptors 1024,
>> memorylocked unlimited, maxproc unlimited
>> -----------------------------------
>>
>> a. We have recently upgraded the cache memory from 2 to 16 GB. However,
>> we have noticed that somehow R still swaps memory when datasets
>> exceeding 2 GB in size are used. An indication that R uses approx. 2 GB
>> of cache memory is that sometimes R also kills the session when datasets
>> > 2 GB are loaded. How/where can we see how much cache memory R uses
>> (since memory.size and memory.limit are only for windows, and to us
>> those might be what we need)?
use object.size(.) to see what's really the size of your large
data object.
Otherwise,
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 155605 8.4 350000 18.7 350000 18.7
Vcells 155621 1.2 2006827 15.4 2156058 16.5
is typically useful, but we often also use 'top' (on Linux
that you have as well) to monitor the R process. Gad already
mentioned this ..
>> those might be what we need)? Could this be caused by the limit of the
>> stack size (we are not exactly sure what the stack size is either) ?
>> b. And how can we increase the cache memory used by R to 14 or even 16
>> GB (which might be tricky when running other programs, but still)?
>>
>> So in general: how can we get R to use the full memory capacity of the
>> computer?
>>
GA> The term "cache memory" is something entirely different to what you're
GA> referring to --- you're talking about RAM.
yes indeed.
GA> Anyway, under Linux R will take all the RAM it can get, and if you're
GA> running a 64-bit OS on a 64-bit CPU then it should definitely be able to
GA> use more than 2GB of RAM.
definitely, and it does; I've tried up to 20 or so GB on a very
similar platform as yours.
HOWEVER, single R 'objects' are limited in size potentially earlier:
Do read help(Memory-limits)
which has all (?) pertinent info,
and then also contains
There are also limits on individual objects. On all versions of
R, the maximum length (number of elements) of a vector is 2^31 - 1
~ 2*10^9, as lengths are stored as signed integers. In addition,
the storage space cannot exceed the address limit, and if you try
to exceed that limit, the error message begins 'cannot allocate
vector of length'. The number of characters in a character string
is in theory only limited by the address space.
and if you now compute, e.g. a numeric vector needs 8 bytes per
entry (plus ~ 40 bytes, it seems, currently on 64-bit linux),
the maximal numeric vector would need
> (2^31-1)*8 + 40
[1] 17179869216
bytes which is 2^14 MBytes (using the 1 MB = 2^20 bytes
definition) which is around 16.4 GB.
The maximal integer/logical vector would be half the size in bytes,
i.e. ~ 8.2 GB.
The *practical* maximal size is often a bit smaller,
and note that you typically should have around 5 to 10 fold the
amount of RAM than your large object, because of copying.
GA> To see the memory usage, use the utility "top" in the console/terminal.
GA> One thing to note: a dataset of 2GB on disk may take much more than 2GB
GA> of RAM when loaded into R, due to the overhead of the metadata and the
GA> fact that pointers are 64-bit long as well.
exactly!
The 'sfsmisc' package (from CRAN), contains some Unix-only
utilities, of which Sys.ps() can be handy: It gives similar
information as 'top' but from inside R.
Use Sys.ps(fields="ALL") to see how many cpu/memory
process-specific info you can get; the default, Sys.ps() as used
below, just uses a few fields, but notably one of memory footprint.
a memory-only version of Sys.ps() is
Sys.sizes() {not used in axample below}, all from 'sfsmisc'.
Here is a small excerpt of an R session (using R-devel,
i.e. 2.8.0 unstable) on one of our 32 GB AMD Opteron
64-bit systems (running Linux, Redhat Enterprise, but could be
Debian/Ubuntu as well):
## empty, newly started R session; a couple of MBs .. :
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 155605 8.4 350000 18.7 350000 18.7
Vcells 155621 1.2 2006827 15.4 2156058 16.5
> x <- rep(pi,2^28)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 155606 8.4 350000 18.7 350000 18.7
Vcells 268591076 2049.2 564122996 4304.0 537026523 4097.2
## Aha: now use 2 GB and have used 4 GB intermediately {also
## visible from 'top'
> x <- rep(x,2) ## double the object size
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 155607 8.4 350000 18.7 350000 18.7
Vcells 537026532 4097.2 1409694707 10755.2 1342332915 10241.2
## Yes, 4 GB now with a max. of 10.2 GB used during object construction
> system.time(x <- x+1)
user system elapsed
3.335 2.603 5.939
## Now this shows the footprint *before* garbage collection:
> sfsmisc::Sys.ps()
pid pcpu time vsz comm
"6451" "3.4" "00:00:52" "8440924" "R"
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 157099 8.4 350000 18.7 350000 18.7
Vcells 537027103 4097.2 1409694707 10755.2 1342332915 10241.2
## after GC, we are back to 4 GB :
> sfsmisc::Sys.ps()
pid pcpu time vsz comm
"6451" "3.4" "00:00:52" "4246616" "R"
>
-----------------
And 'top' (or other such tools) confirm, that the machine never
swapped. {I wouldn't notice easily, as I am sitting at home, and
the computer runs in a vault down there at ETH ;-)}.
So you see, a 4 GB object was not a problem on this machine.
Martin Maechler, ETH Zurich
GA> --
GA> Gad Abraham
GA> Dept. CSSE and NICTA
GA> The University of Melbourne
GA> Parkville 3010, Victoria, Australia
GA> email: gabraham at csse.unimelb.edu.au
GA> web: http://www.csse.unimelb.edu.au/~gabraham
More information about the R-help
mailing list