[Rd] Holding a large number of SEXPs in C++

Simon Knapp sleepingwell at gmail.com
Mon Nov 3 23:34:41 CET 2014


Thanks again Simon. I had realised that R_NilValue didn't need
protection... I just thought it a clean way to make my initial call to
PROTECT_WITH_INDEX (which I can see now was not required since I didn't
need the calls to REPROTECT)... and I had not thought of appending to the
tail.

One final question (and hopefully I don't get to badly burnt) I cannot find
R_PreserveObject/R_ReleaseObject or SETCDR mentioned in "Writing R
Extensions". Is there anywhere for a novice like myself to find a
'complete' reference to Rs useful macros and functions, or do I just have
to read more source?

Thanks again for being so awesome,
Simon

On Tue, Nov 4, 2014 at 12:47 AM, Simon Urbanek <simon.urbanek at r-project.org>
wrote:

>
> On Nov 2, 2014, at 10:55 PM, Simon Knapp <sleepingwell at gmail.com> wrote:
>
> > Thanks Simon and sorry for taking so long to give this a go. I had
> thought of pair lists but got confused about how to protect the top level
> object only, as it seems that appending requires creating a new "top-level
> object". The following example seems to work (full example at
> https://gist.github.com/Sleepingwell/8588c5ee844ce0242d05). Is this the
> way you would do it (or at least 'a correct' way)?
> >
>
> You can simply append to a pairlist, so you only need to protect the head.
> Also note that R_NilValue is a constant (in R sense, not C sense) so it
> doesn't need protection. I would write a generic pairlist builder something
> like that:
>
> SEXP head = R_NilValue, tail;
>
> void append(SEXP x) {
>   if (head == R_NilValue)
>         R_PreserveObject(head = tail = CONS(x, R_NilValue));
>   else
>         tail = SETCDR(tail, CONS(x, R_NilValue));
> }
>
> void destroy() {
>    if (head != R_NilValue)
>         R_ReleaseObject(head);
> }
>
> Cheers,
> Simon
>
>
> >
> >
> > struct PolyHolder {
> >     PolyHolder(void) {
> >         PROTECT_WITH_INDEX(currentRegion = R_NilValue, &icr);
> >         PROTECT_WITH_INDEX(regions = R_NilValue, &ir);
> >     }
> >
> >     ~PolyHolder(void) {
> >         UNPROTECT(2);
> >     }
> >
> >     void notifyEndRegion(void) {
> >         REPROTECT(regions =
> CONS(makePolygonsFromPairList(currentRegion), regions), ir);
> >         REPROTECT(currentRegion = R_NilValue, icr);
> >     }
> >
> >     template<typename Iter>
> >     void addSubPolygon(Iter b, Iter e) {
> >         REPROTECT(currentRegion = CONS(makePolygon(b, e),
> currentRegion), icr);
> >     }
> >
> >     SEXP getPolygons(void) {
> >         return regions;
> >     }
> >
> > private:
> >     PROTECT_INDEX
> >         ir,
> >         icr;
> >
> >     SEXP
> >         currentRegion,
> >         regions;
> > };
> >
> >
> >
> > Thanks again,
> > Simon Knapp
> >
> >
> >
> > CONS(newPoly, creates a new object
> > On Sat, Oct 18, 2014 at 2:10 AM, Simon Urbanek <
> simon.urbanek at r-project.org> wrote:
> >
> > On Oct 17, 2014, at 7:31 AM, Simon Knapp <sleepingwell at gmail.com> wrote:
> >
> > > Background:
> > > I have an algorithm which produces a large number of small polygons
> (of the
> > > spatial kind) which I would like to use within R using objects from
> sp. I
> > > can't predict the exact number of polygons a-priori, the polygons will
> be
> > > grouped into regions, and each region will be filled sequentially, so
> an
> > > appropriate C++ 'framework' (for the point of illustration) might be:
> > >
> > > typedef std::pair<double, double> Point;
> > > typedef std::vector<Point> Polygon;
> > > typedef std::vector<Polygon> Polygons;
> > > typedef std::vector<Polygons> Regions;
> > >
> > > struct Holder {
> > >    void notifyNewRegion(void) const {
> > >        regions.push_back(Polygons());
> > >    }
> > >
> > >    template<typename Iter>
> > >    void addSubPoly(Iter b, Iter e) {
> > >        regions.back().push_back(Polygon(b, e));
> > >    }
> > >
> > > private:
> > >    Regions regions;
> > > };
> > >
> > > where the reference_type of Iter is convertible to Point. In practice
> I use
> > > pointers in a couple of places to avoid resizing in push_back becoming
> too
> > > expensive.
> > >
> > > To construct the corresponding sp::Polygon, sp::Polygons and
> > > sp::SpatialPolygons at the end of the algorithm, I iterate over the
> result
> > > turning each Polygon into a two column matrix and calling the C
> functions
> > > corresponding to the 'constructors' for these objects.
> > >
> > > This is all working fine, but I could cut my memory consumption in
> half if
> > > I could construct the sp::Polygon objects in addSubPoly, and the
> > > sp::Polygons objects in notifyNewRegion. My vector typedefs would then
> all
> > > be:
> > >
> > > typedef std::vector<SEXP>
> > >
> > >
> > >
> > >
> > > Question:
> > > What I'm not sure about (and finally my question) is: I will have
> datasets
> > > where I have more than 10,000 SEXPs in the Polygon and Polygons
> objects for
> > > a single region, and possibly more than 10,000 regions, so how do I
> PROTECT
> > > all those SEXPs (noting that the protection stack is limited to 10,000
> and
> > > bearing in mind that I don't know how many there will be before I
> start)?
> > >
> > > I am also interested in this just out of general curiosity.
> > >
> > >
> > >
> > >
> > > Thoughts:
> > >
> > > 1) I could create an environment and store the objects themselves in
> there
> > > while keeping pointers in the vectors, but am not sure if this would be
> > > that efficient (guidance would be appreciated), or
> > >
> > > 2) Just keep them in R vectors and grow these myself (as push_back is
> doing
> > > for me in the above), but that sounds like a pain and I'm not sure if
> the
> > > objects or just the pointers would be copied when I reassigned things
> > > (guidance would be appreciated again). Bare in mind that I keep
> pointers in
> > > the vectors, but omitted that for the sake of clarity.
> > >
> > >
> > >
> > >
> > > Is there some other R type that would be suited to this, or a general
> > > approach?
> > >
> >
> > Lists in R (LISTSXP aka pairlists) are suited to appending (since that
> is fast and trivial) and sequential processing. The only issue is that
> pairlists are slow for random access. If you only want to load the polygons
> and finalize, then you can hold them in a pairlist and at the end copy to a
> generic vector (if random access is expected). DB applications typically
> use a hybrid approach -  allocate vector blocks and keep them in pairlists,
> but that's probably an overkill for your use (if you really cared about
> performance you wouldn't use sp objects for this ;))
> >
> > Note that you only have to protect the top-level object, so you don't
> need to protect the individual elements.
> >
> > Cheers,
> > Simon
> >
> >
> > > Cheers and thanks in advance,
> > > Simon Knapp
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> >
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list