[R] vector memory allocation?

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Mar 5 08:45:51 CET 2005


On Fri, 4 Mar 2005, Sam Yeaman wrote:

> I have a vector size allocation problem with R 2.0.1 (script and output 
> shown):

And your computer's OS and RAM size are?

>> var1 <- sum (input1 * input2, na = TRUE)

What do you think na=TRUE does?  There is an na.rm argument, but

sum(1, na=TRUE)

may surprise you.  (You cannot use partial matching on arguments after 
..., and you would do well never to use it when programming or reporting a 
problem)

>> gc()
>          used  (Mb) gc trigger   (Mb)
> Ncells   199327   5.4     785113   21.0 Vcells 71039552 542.0  206003790 
> 1571.7
>
>> var2 <- sum (input1 * input2 / input2, na = TRUE)
>
> Error: cannot allocate vector of size 524288 Kb
>
> input1 and input2 are matrices input from text files of about 100 MB.

Numeric matrices, presumably?  And what object.size()?

> This error happens irrespective of whether I calculate var1 or var2 
> first...it will always calculate the first and always have an error on the 
> second.

It is likely that you have fragmented the address space.  If this is a 
32-bit OS, you are using objects large compared to the user address space 
(probably 2 or 3 Gb).  Finding gaps of 512Mb (that's a suspiciously 
`round' number, 524288Kb = 512Mb) in a 2Gb block is not easy

> Am I misusing the garbage-collector? I am confused by the fact that 
> the difference between 'trigger' and 'used' seems so much higher than the 
> size of vector that it says it can't allocate.

Like a factor of 2.01 `higher'?  Methinks you do protest too much.

It has to allocate at least 2 vectors en route to var2.  Something like

tmp1 <- input1 * input2
tmp2 <- tmp1/input2
var2 <- sum(tmp2, as.numeric(TRUE)

However, the 'trigger' is exactly that, and can exceed the 
address space of the machine: it is the point at which garbage collection 
gets called, not the size of the available vector heap.

We know at least one way to improve R's ability to work close to the 
address space limits (and it is likely to appear in 2.1.0), but really
doing anything useful with 512Mb objects needs a 64-bit machine.
(R has supported 64-bit OSes for serveral years.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list