[Rd] Is it possible to shrink an R object in place?
Simon Urbanek
simon.urbanek at r-project.org
Fri Apr 11 22:34:20 CEST 2014
On Apr 11, 2014, at 3:47 PM, Romain Francois <romain at r-enthusiasts.com> wrote:
> Hello,
>
> I’ve been using shrinking in https://github.com/hadley/dplyr/blob/master/inst/include/tools/ShrinkableVector.h
>
> This defines a ShrinkableVector of some R type (INTSXP, ...) given the maximum number of elements it will hold. Then, I reset with SETLENGTH when needed. The constructor protects the SEXP, and the destructor restores the original length before removing the protection. With this I only have to allocate the data once, and I can make R believe a vector is of a different size. As long as I restore the correct size eventually.
>
I like the destructor touch of restoring the size :) - that is neat.
But as I said, this is only useful in cases where you strip off a few elements, otherwise you're better off creating a copy because of the memory implications.
Cheers,
Simon
> Kevin, when you start using parallelism, you have to change the way you approach the sequence of things that go on. Particularly it is less of a problem to do a double pass, i.e. one to figure out the appropriate size and one to handle part of the data. And guess what, there is lots of that to come in next versions of Rcpp11.
>
> Romain
>
> Le 11 avr. 2014 à 17:08, Simon Urbanek <simon.urbanek at r-project.org> a écrit :
>
>> Kevin,
>> Kevin,
>>
>> On Apr 10, 2014, at 4:57 PM, Kevin Ushey <kevinushey at gmail.com> wrote:
>>
>>> Suppose I generate an integer vector with e.g.
>>>
>>> SEXP iv = PROTECT(allocVector(INTSXP, 100));
>>>
>>> and later want to shrink the object, e.g.
>>>
>>> shrink(iv, 50);
>>>
>>> would simply re-set the length to 50, and allow R to reclaim the
>>> memory that was previously used.
>>>
>>> Is it possible to do this while respecting how R manages memory?
>>>
>>
>> The short answer is, no.
>>
>> There are several problems with this, one of the main ones being that there is simply no way to release the "excess" memory, so the vector still has the full length in memory. There is the SETLENGTH() function, but it's not part of the API and it has been proposed for elimination because of the inherent issues it causes (discrepancy in allocated and reported length).
>>
>>
>>> The motivation: there are many operations where the length of the
>>> output is not known ahead of time, and in such cases one typically
>>> uses a data structure that can grow efficiently. Unfortunately, IIUC
>>> SEXPRECs cannot do this; however, an alternative possibility would
>>> involve reserving extra memory, and then shrinking to fit after the
>>> operation is complete.
>>>
>>> There have been some discussions previously that defaulted to answers
>>> of the form "you should probably just copy", e.g.
>>> https://stat.ethz.ch/pipermail/r-devel/2008-March/048593.html, but I
>>> wanted to ping and see if others had ideas, or if perhaps there was
>>> code in the R sources that might be relevant.
>>>
>>> Another reason why this is interesting is due to C++11 and
>>> multi-threading: if I can pre-allocate SEXPs that will contain results
>>> in the main thread, and then fill these SEXPs asynchronously (without
>>> touching R, and hence not getting in the way of the GC or otherwise),
>>> I can then fill these SEXPs in place and shrink-to-fit after the
>>> computations have been completed. With C++11 support coming with R
>>> 3.1.0, functionality like this is very attractive.
>>>
>>
>> I don't see how this is related to the question - it was always possible to fill SEXPs from parallel threads and has been routinely used even in R itself (most commonly via OpenMP).
>>
>>
>>> The obvious alternatives are to 1) determine the length of the output
>>> first and hence generate SEXPs of appropriate size right off the bat
>>> (potentially expensive), and 2) fill thread-safe containers and copy
>>> to an R object (definitely expensive).
>>>
>>
>> In most current OSes, it is impossible to shrink allocated memory in-place, so if you really don't know the size of the object, it will be copied anyway. As mentioned above, the only case where shrinking may work is if you only need to strip a few elements of a large vector so that keeping the same allocation has no significant effect.
>>
>> Cheers,
>> Simon
>>
>>
>>
>>
>>> I am probably missing something subtle (or obvious) as to why this may
>>> not work, or be recommended, so I appreciate any comments.
>>>
>>> Thanks,
>>> Kevin
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list