[Rd] Holding a large number of SEXPs in C++
Simon Knapp
sleepingwell at gmail.com
Mon Nov 3 04:55:36 CET 2014
Thanks Simon and sorry for taking so long to give this a go. I had thought
of pair lists but got confused about how to protect the top level object
only, as it seems that appending requires creating a new "top-level
object". The following example seems to work (full example at
https://gist.github.com/Sleepingwell/8588c5ee844ce0242d05). Is this the way
you would do it (or at least 'a correct' way)?
struct PolyHolder {
PolyHolder(void) {
PROTECT_WITH_INDEX(currentRegion = R_NilValue, &icr);
PROTECT_WITH_INDEX(regions = R_NilValue, &ir);
}
~PolyHolder(void) {
UNPROTECT(2);
}
void notifyEndRegion(void) {
REPROTECT(regions = CONS(makePolygonsFromPairList(currentRegion),
regions), ir);
REPROTECT(currentRegion = R_NilValue, icr);
}
template<typename Iter>
void addSubPolygon(Iter b, Iter e) {
REPROTECT(currentRegion = CONS(makePolygon(b, e), currentRegion),
icr);
}
SEXP getPolygons(void) {
return regions;
}
private:
PROTECT_INDEX
ir,
icr;
SEXP
currentRegion,
regions;
};
Thanks again,
Simon Knapp
CONS(newPoly, creates a new object
On Sat, Oct 18, 2014 at 2:10 AM, Simon Urbanek <simon.urbanek at r-project.org>
wrote:
>
> On Oct 17, 2014, at 7:31 AM, Simon Knapp <sleepingwell at gmail.com> wrote:
>
> > Background:
> > I have an algorithm which produces a large number of small polygons (of
> the
> > spatial kind) which I would like to use within R using objects from sp. I
> > can't predict the exact number of polygons a-priori, the polygons will be
> > grouped into regions, and each region will be filled sequentially, so an
> > appropriate C++ 'framework' (for the point of illustration) might be:
> >
> > typedef std::pair<double, double> Point;
> > typedef std::vector<Point> Polygon;
> > typedef std::vector<Polygon> Polygons;
> > typedef std::vector<Polygons> Regions;
> >
> > struct Holder {
> > void notifyNewRegion(void) const {
> > regions.push_back(Polygons());
> > }
> >
> > template<typename Iter>
> > void addSubPoly(Iter b, Iter e) {
> > regions.back().push_back(Polygon(b, e));
> > }
> >
> > private:
> > Regions regions;
> > };
> >
> > where the reference_type of Iter is convertible to Point. In practice I
> use
> > pointers in a couple of places to avoid resizing in push_back becoming
> too
> > expensive.
> >
> > To construct the corresponding sp::Polygon, sp::Polygons and
> > sp::SpatialPolygons at the end of the algorithm, I iterate over the
> result
> > turning each Polygon into a two column matrix and calling the C functions
> > corresponding to the 'constructors' for these objects.
> >
> > This is all working fine, but I could cut my memory consumption in half
> if
> > I could construct the sp::Polygon objects in addSubPoly, and the
> > sp::Polygons objects in notifyNewRegion. My vector typedefs would then
> all
> > be:
> >
> > typedef std::vector<SEXP>
> >
> >
> >
> >
> > Question:
> > What I'm not sure about (and finally my question) is: I will have
> datasets
> > where I have more than 10,000 SEXPs in the Polygon and Polygons objects
> for
> > a single region, and possibly more than 10,000 regions, so how do I
> PROTECT
> > all those SEXPs (noting that the protection stack is limited to 10,000
> and
> > bearing in mind that I don't know how many there will be before I start)?
> >
> > I am also interested in this just out of general curiosity.
> >
> >
> >
> >
> > Thoughts:
> >
> > 1) I could create an environment and store the objects themselves in
> there
> > while keeping pointers in the vectors, but am not sure if this would be
> > that efficient (guidance would be appreciated), or
> >
> > 2) Just keep them in R vectors and grow these myself (as push_back is
> doing
> > for me in the above), but that sounds like a pain and I'm not sure if the
> > objects or just the pointers would be copied when I reassigned things
> > (guidance would be appreciated again). Bare in mind that I keep pointers
> in
> > the vectors, but omitted that for the sake of clarity.
> >
> >
> >
> >
> > Is there some other R type that would be suited to this, or a general
> > approach?
> >
>
> Lists in R (LISTSXP aka pairlists) are suited to appending (since that is
> fast and trivial) and sequential processing. The only issue is that
> pairlists are slow for random access. If you only want to load the polygons
> and finalize, then you can hold them in a pairlist and at the end copy to a
> generic vector (if random access is expected). DB applications typically
> use a hybrid approach - allocate vector blocks and keep them in pairlists,
> but that's probably an overkill for your use (if you really cared about
> performance you wouldn't use sp objects for this ;))
>
> Note that you only have to protect the top-level object, so you don't need
> to protect the individual elements.
>
> Cheers,
> Simon
>
>
> > Cheers and thanks in advance,
> > Simon Knapp
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list