[Rd] gc()$Vcells < 0 (PR#9345)

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Nov 8 08:56:24 CET 2006


On Tue, 7 Nov 2006, Vladimir Dergachev wrote:

> On Tuesday 07 November 2006 6:28 am, Prof Brian Ripley wrote:
>> On Mon, 6 Nov 2006, Vladimir Dergachev wrote:
>>> On Monday 06 November 2006 6:12 pm, dmaszle at mendelbio.com wrote:
>>>> version.string Version 2.3.0 (2006-04-24)
>>>>
>>>>> x<-matrix(nrow=44000,ncol=48000)
>>>>> y<-matrix(nrow=44000,ncol=48000)
>>>>> z<-matrix(nrow=44000,ncol=48000)
>>>>> gc()
>>>>
>>>>               used    (Mb) gc trigger    (Mb) max used    (Mb)
>>>> Ncells      177801     9.5     407500    21.8   350000    18.7
>>>> Vcells -1126881981 24170.6         NA 24173.4       NA 24170.6
>>>
>>> Happens to me with versions 2.40 and 2.3.1. The culprit is this line
>>> in src/main/memory.c:
>>>
>>>    INTEGER(value)[1] = R_VSize - VHEAP_FREE();
>>>
>>> Since the amount used is greater than 4G and INTEGER is 32bit long
>>> (even on 64 bit machines) this returns (harmless) nonsense.
>>
>> That's not quite correct.  The units here are Vcells (8 bytes), and
>> integer() is signed, so this can happen only if more than 16Gb of heap is
>> allocated.
>
> I see - thank you for the explanation !
>
>>
>> We are aware that we begin to hit problems at 16Gb: it is for example the
>> maximum size of an R vector.  Those objects are logical and so about 7.8Gb
>> each: their length as vectors is 98% of the maximum possible.  However,
>> the first time we discussed it we thought it would be about 5 years before
>> those limits would become important -- I think three of those years have
>> since passed.
>>
>>> The megabyte value nearby is correct and gc trigger and max used fields
>>> are marked as NA already.
>>
>> and now 'used' is also marked as NA in 2.4.0 patched.
>
> Great, thank you !
>
>>
>> This is only a reporting issue.  When I first used R it reported only
>> numbers, and I added the Mb as a more comprehensible figure (especially
>> for Ncells).  I think it would be sensible now to only report these
>> figures in Mb or Gb (and also the reports for gcinfo(TRUE)).
>
> Why not use KB ? This still preserves information about small allocations and
> raises the limit to 16 TB - surely at least 5 years off ! :)

We already use 0.1Mb: why would you have any need for more accuracy?

> Alternatively, doubles should be able to hold the entire number, but this
> would require changes to how information is displayed.

Actually, no as the matrix returned is double.  What it did require is 
some redesign of the internal code, but that had already been done in 
R-devel.  I don't have access to a machine which can handle more than 16Gb 
of memory allocation, but it should be the case that in R-devel the actual 
values will be returned.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list