[Rd] Holding a large number of SEXPs in C++

Simon Knapp sleepingwell at gmail.com
Fri Oct 17 13:31:55 CEST 2014


Background:
I have an algorithm which produces a large number of small polygons (of the
spatial kind) which I would like to use within R using objects from sp. I
can't predict the exact number of polygons a-priori, the polygons will be
grouped into regions, and each region will be filled sequentially, so an
appropriate C++ 'framework' (for the point of illustration) might be:

typedef std::pair<double, double> Point;
typedef std::vector<Point> Polygon;
typedef std::vector<Polygon> Polygons;
typedef std::vector<Polygons> Regions;

struct Holder {
    void notifyNewRegion(void) const {
        regions.push_back(Polygons());
    }

    template<typename Iter>
    void addSubPoly(Iter b, Iter e) {
        regions.back().push_back(Polygon(b, e));
    }

private:
    Regions regions;
};

where the reference_type of Iter is convertible to Point. In practice I use
pointers in a couple of places to avoid resizing in push_back becoming too
expensive.

To construct the corresponding sp::Polygon, sp::Polygons and
sp::SpatialPolygons at the end of the algorithm, I iterate over the result
turning each Polygon into a two column matrix and calling the C functions
corresponding to the 'constructors' for these objects.

This is all working fine, but I could cut my memory consumption in half if
I could construct the sp::Polygon objects in addSubPoly, and the
sp::Polygons objects in notifyNewRegion. My vector typedefs would then all
be:

typedef std::vector<SEXP>




Question:
What I'm not sure about (and finally my question) is: I will have datasets
where I have more than 10,000 SEXPs in the Polygon and Polygons objects for
a single region, and possibly more than 10,000 regions, so how do I PROTECT
all those SEXPs (noting that the protection stack is limited to 10,000 and
bearing in mind that I don't know how many there will be before I start)?

I am also interested in this just out of general curiosity.




Thoughts:

1) I could create an environment and store the objects themselves in there
while keeping pointers in the vectors, but am not sure if this would be
that efficient (guidance would be appreciated), or

2) Just keep them in R vectors and grow these myself (as push_back is doing
for me in the above), but that sounds like a pain and I'm not sure if the
objects or just the pointers would be copied when I reassigned things
(guidance would be appreciated again). Bare in mind that I keep pointers in
the vectors, but omitted that for the sake of clarity.




Is there some other R type that would be suited to this, or a general
approach?

Cheers and thanks in advance,
Simon Knapp

	[[alternative HTML version deleted]]



More information about the R-devel mailing list