[Rd] gc()$Vcells < 0 (PR#9345)

Vladimir Dergachev vdergachev at rcgardis.com
Tue Nov 7 16:45:26 CET 2006


On Tuesday 07 November 2006 6:28 am, Prof Brian Ripley wrote:
> On Mon, 6 Nov 2006, Vladimir Dergachev wrote:
> > On Monday 06 November 2006 6:12 pm, dmaszle at mendelbio.com wrote:
> >> version.string Version 2.3.0 (2006-04-24)
> >>
> >>> x<-matrix(nrow=44000,ncol=48000)
> >>> y<-matrix(nrow=44000,ncol=48000)
> >>> z<-matrix(nrow=44000,ncol=48000)
> >>> gc()
> >>
> >>               used    (Mb) gc trigger    (Mb) max used    (Mb)
> >> Ncells      177801     9.5     407500    21.8   350000    18.7
> >> Vcells -1126881981 24170.6         NA 24173.4       NA 24170.6
> >
> > Happens to me with versions 2.40 and 2.3.1. The culprit is this line
> > in src/main/memory.c:
> >
> >    INTEGER(value)[1] = R_VSize - VHEAP_FREE();
> >
> > Since the amount used is greater than 4G and INTEGER is 32bit long
> > (even on 64 bit machines) this returns (harmless) nonsense.
>
> That's not quite correct.  The units here are Vcells (8 bytes), and
> integer() is signed, so this can happen only if more than 16Gb of heap is
> allocated.

I see - thank you for the explanation !

>
> We are aware that we begin to hit problems at 16Gb: it is for example the
> maximum size of an R vector.  Those objects are logical and so about 7.8Gb
> each: their length as vectors is 98% of the maximum possible.  However,
> the first time we discussed it we thought it would be about 5 years before
> those limits would become important -- I think three of those years have
> since passed.
>
> > The megabyte value nearby is correct and gc trigger and max used fields
> > are marked as NA already.
>
> and now 'used' is also marked as NA in 2.4.0 patched.

Great, thank you !

>
> This is only a reporting issue.  When I first used R it reported only
> numbers, and I added the Mb as a more comprehensible figure (especially
> for Ncells).  I think it would be sensible now to only report these
> figures in Mb or Gb (and also the reports for gcinfo(TRUE)).

Why not use KB ? This still preserves information about small allocations and 
raises the limit to 16 TB - surely at least 5 years off ! :)

Alternatively, doubles should be able to hold the entire number, but this 
would require changes to how information is displayed.

>
> The model behind the report actually pre-dates the GC change in 1.2.0.
> The 'Vcells' are nowadays the sum of all the allocations from VECSXPs
> (which include their headers), rather than the 'vector heap' (although
> some of the earlier terminology persists).

I see.

           thank you !

                  Vladimir Dergachev



More information about the R-devel mailing list