[R-sig-Geo] poly %over% poly in rgeos; memory tips

Edzer Pebesma edzer.pebesma at uni-muenster.de
Fri May 22 17:50:22 CEST 2015


Mike, you may want to read two recent threads:

https://stat.ethz.ch/pipermail/r-sig-geo/2015-April/022624.html
https://stat.ethz.ch/pipermail/r-sig-geo/2015-April/022663.html

When you try to aggregate, instead of using over(), have you used
sp::aggregate directly, for instance with option areaWeighted = TRUE?

If memory is a bottle neck, I'd suggest to either buy more of it, or
choose a workflow that needs less of it, like PostGIS with SQL.

On 05/22/2015 04:41 PM, Mike wrote:
> Hi all, big fan of the listserv, learning a lot.
> 
> I've a few hopefully questions about the sp and rgeos packages.
> 
> #1 I've had success using %over% in the sp packages, but can't seem to find
> the help for poly-poly (spatial poly data frame or spatial poly over the
> same), implemented in rgeos.  Where is that explained?  I can't seem to get
> it to return an aggregation in the way poly over points does, where I can
> use fun=sum to aggregate a number of data frame columns at once.  I've read
> a number of 3rd party explanations and poked in a few books, but haven't
> found much on poly poly data aggregation with over.
> 
> #2 for large geometries, any suggestions of way to be more efficient in
> general in spatial work?  I love the gRelate function (see below), but for
> attempting to aggregate ~2000 3mile buffers over ~200,000 NC census blocks,
> gRelate seems to fail more often than not.  (I really only need OVER in
> this case, but have tried work arounds since, in #1, I can't get over to
> work poly-poly the way I need).
> 
> Thanks!  More detail and specific case below.
> 
> Best,
> Mike
> UNC Epidemiology
> 
> ====
> 
> Context:
> 
> I'm working on porting an environmental epidemiology project done in Arc
> and STATA to all R.  Relevant context - we have point exposures (in this
> case, industrial hog operations) with 3 mile buffers to represent their
> exposure radius (red dots and pink buffers in the map, from gBuffer after
> spTransform).  Exclusion criteria (blue shapes) are (1) counties that only
> border IHOs-less counties (grey) and the five most populous cities (where
> IHOs are never cited) - that's done through the fantastic gRelate function
> (see below for code).
> 
> IHOblob = gUnionCascaded(hasIHOcounties)
> #plot(IHOblob) #can plot this to check it out.
> touch.v = gRelate(noIHOcounties, IHOblob, byid = T)
> str(touch.v)
> counties.not.touching<-(touch.v %in% c("FF2FF1212")) #DE-91M is super cool.
> notouchIHOcounties = noIHOcounties[counties.not.touching,]
> plot(notouchIHOcounties, co="light blue", lwd=0.6, add=T)
> 
> See image for a quick visual of what's going on:
> http://i.imgur.com/4m6CgW8.png
> 
> The analysis, however, is happening at the census block level.  Here's
> where I have two challenges I'm hoping to get some help with.
> 
> #1 poly over poly dataframe aggregation:  I understand that the %over%
> function for poly-poly is implemented in rgeos instead of sp (see table
> below).  Sp's over help clarifies how arguments need to be aligned or
> stripped of data frames to get what you want.  So, for instance, as a test
> case, here's successful code to aggregate the *point* data on IHOs to their
> respective counties (if there's a better way to do this, Please let me
> know):
> NC.counties.shp at data = data.frame(c(NC.counties.shp at data,
> over(NC.counties.shp, IHOs.spdf, fun=sum)))
> 
> Over implementations: http://i.imgur.com/X8Im2gI.png
> 
> However, the goal for the census blocks is to aggregate various
> quantitative variables from IHOs within 3 miles (hence the buffer) of any
> point on the edge.  I can imagine some brute force looping methods using
> the gRelate function, or distance, but can't seem to nail it in one pass
> with over the way I might with QGIS or Arc.  In the below case,
> IHOs.3mibuff.spdf is now a spatial poly data frame, as are the study
> blocks.  When I try passing the first argument (counties or blocks) as just
> a geometry, I still get a matrix of named rows and some single aggregated
> number - but not all the aggregated columns in the data frame.
> test <-over(NC.studyblocks.shp at polygons, IHOs.3mibuff.spdf, fun=sum) ??
> test <-over(NC.studyblocks.shp, IHOs.3mibuff.spdf, fun=sum) ??
> 
> #2 Issues with memory.  I've read a few tip sets on generic size best
> practices that aren't spatial specific, but if anyone has any tips, I'd
> love to hear them.  I seem to have had some success with gc() after large
> subsetting.  In (probably miscoding) the analysis of gRelate of 200,000
> blocks and 2,000 buffer circles, I eat up 6gigs with ease.
> 
> Best wishes, again,
> mike
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi),  University of Münster,
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/
Spatial Statistics Society http://www.spatialstatistics.info

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20150522/84439dbb/attachment.bin>


More information about the R-sig-Geo mailing list