[R-sig-Geo] poly %over% poly in rgeos; memory tips

Mike aspiringbodhisattva at gmail.com
Fri May 22 16:41:15 CEST 2015


Hi all, big fan of the listserv, learning a lot.

I've a few hopefully questions about the sp and rgeos packages.

#1 I've had success using %over% in the sp packages, but can't seem to find
the help for poly-poly (spatial poly data frame or spatial poly over the
same), implemented in rgeos.  Where is that explained?  I can't seem to get
it to return an aggregation in the way poly over points does, where I can
use fun=sum to aggregate a number of data frame columns at once.  I've read
a number of 3rd party explanations and poked in a few books, but haven't
found much on poly poly data aggregation with over.

#2 for large geometries, any suggestions of way to be more efficient in
general in spatial work?  I love the gRelate function (see below), but for
attempting to aggregate ~2000 3mile buffers over ~200,000 NC census blocks,
gRelate seems to fail more often than not.  (I really only need OVER in
this case, but have tried work arounds since, in #1, I can't get over to
work poly-poly the way I need).

Thanks!  More detail and specific case below.

Best,
Mike
UNC Epidemiology

====

Context:

I'm working on porting an environmental epidemiology project done in Arc
and STATA to all R.  Relevant context - we have point exposures (in this
case, industrial hog operations) with 3 mile buffers to represent their
exposure radius (red dots and pink buffers in the map, from gBuffer after
spTransform).  Exclusion criteria (blue shapes) are (1) counties that only
border IHOs-less counties (grey) and the five most populous cities (where
IHOs are never cited) - that's done through the fantastic gRelate function
(see below for code).

IHOblob = gUnionCascaded(hasIHOcounties)
#plot(IHOblob) #can plot this to check it out.
touch.v = gRelate(noIHOcounties, IHOblob, byid = T)
str(touch.v)
counties.not.touching<-(touch.v %in% c("FF2FF1212")) #DE-91M is super cool.
notouchIHOcounties = noIHOcounties[counties.not.touching,]
plot(notouchIHOcounties, co="light blue", lwd=0.6, add=T)

See image for a quick visual of what's going on:
http://i.imgur.com/4m6CgW8.png

The analysis, however, is happening at the census block level.  Here's
where I have two challenges I'm hoping to get some help with.

#1 poly over poly dataframe aggregation:  I understand that the %over%
function for poly-poly is implemented in rgeos instead of sp (see table
below).  Sp's over help clarifies how arguments need to be aligned or
stripped of data frames to get what you want.  So, for instance, as a test
case, here's successful code to aggregate the *point* data on IHOs to their
respective counties (if there's a better way to do this, Please let me
know):
NC.counties.shp at data = data.frame(c(NC.counties.shp at data,
over(NC.counties.shp, IHOs.spdf, fun=sum)))

Over implementations: http://i.imgur.com/X8Im2gI.png

However, the goal for the census blocks is to aggregate various
quantitative variables from IHOs within 3 miles (hence the buffer) of any
point on the edge.  I can imagine some brute force looping methods using
the gRelate function, or distance, but can't seem to nail it in one pass
with over the way I might with QGIS or Arc.  In the below case,
IHOs.3mibuff.spdf is now a spatial poly data frame, as are the study
blocks.  When I try passing the first argument (counties or blocks) as just
a geometry, I still get a matrix of named rows and some single aggregated
number - but not all the aggregated columns in the data frame.
test <-over(NC.studyblocks.shp at polygons, IHOs.3mibuff.spdf, fun=sum) ??
test <-over(NC.studyblocks.shp, IHOs.3mibuff.spdf, fun=sum) ??

#2 Issues with memory.  I've read a few tip sets on generic size best
practices that aren't spatial specific, but if anyone has any tips, I'd
love to hear them.  I seem to have had some success with gc() after large
subsetting.  In (probably miscoding) the analysis of gRelate of 200,000
blocks and 2,000 buffer circles, I eat up 6gigs with ease.

Best wishes, again,
mike

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list