[R-sig-Geo] poly %over% poly in rgeos; memory tips
Edzer Pebesma
edzer.pebesma at uni-muenster.de
Fri May 22 17:50:22 CEST 2015
Mike, you may want to read two recent threads:
https://stat.ethz.ch/pipermail/r-sig-geo/2015-April/022624.html
https://stat.ethz.ch/pipermail/r-sig-geo/2015-April/022663.html
When you try to aggregate, instead of using over(), have you used
sp::aggregate directly, for instance with option areaWeighted = TRUE?
If memory is a bottle neck, I'd suggest to either buy more of it, or
choose a workflow that needs less of it, like PostGIS with SQL.
On 05/22/2015 04:41 PM, Mike wrote:
> Hi all, big fan of the listserv, learning a lot.
>
> I've a few hopefully questions about the sp and rgeos packages.
>
> #1 I've had success using %over% in the sp packages, but can't seem to find
> the help for poly-poly (spatial poly data frame or spatial poly over the
> same), implemented in rgeos. Where is that explained? I can't seem to get
> it to return an aggregation in the way poly over points does, where I can
> use fun=sum to aggregate a number of data frame columns at once. I've read
> a number of 3rd party explanations and poked in a few books, but haven't
> found much on poly poly data aggregation with over.
>
> #2 for large geometries, any suggestions of way to be more efficient in
> general in spatial work? I love the gRelate function (see below), but for
> attempting to aggregate ~2000 3mile buffers over ~200,000 NC census blocks,
> gRelate seems to fail more often than not. (I really only need OVER in
> this case, but have tried work arounds since, in #1, I can't get over to
> work poly-poly the way I need).
>
> Thanks! More detail and specific case below.
>
> Best,
> Mike
> UNC Epidemiology
>
> ====
>
> Context:
>
> I'm working on porting an environmental epidemiology project done in Arc
> and STATA to all R. Relevant context - we have point exposures (in this
> case, industrial hog operations) with 3 mile buffers to represent their
> exposure radius (red dots and pink buffers in the map, from gBuffer after
> spTransform). Exclusion criteria (blue shapes) are (1) counties that only
> border IHOs-less counties (grey) and the five most populous cities (where
> IHOs are never cited) - that's done through the fantastic gRelate function
> (see below for code).
>
> IHOblob = gUnionCascaded(hasIHOcounties)
> #plot(IHOblob) #can plot this to check it out.
> touch.v = gRelate(noIHOcounties, IHOblob, byid = T)
> str(touch.v)
> counties.not.touching<-(touch.v %in% c("FF2FF1212")) #DE-91M is super cool.
> notouchIHOcounties = noIHOcounties[counties.not.touching,]
> plot(notouchIHOcounties, co="light blue", lwd=0.6, add=T)
>
> See image for a quick visual of what's going on:
> http://i.imgur.com/4m6CgW8.png
>
> The analysis, however, is happening at the census block level. Here's
> where I have two challenges I'm hoping to get some help with.
>
> #1 poly over poly dataframe aggregation: I understand that the %over%
> function for poly-poly is implemented in rgeos instead of sp (see table
> below). Sp's over help clarifies how arguments need to be aligned or
> stripped of data frames to get what you want. So, for instance, as a test
> case, here's successful code to aggregate the *point* data on IHOs to their
> respective counties (if there's a better way to do this, Please let me
> know):
> NC.counties.shp at data = data.frame(c(NC.counties.shp at data,
> over(NC.counties.shp, IHOs.spdf, fun=sum)))
>
> Over implementations: http://i.imgur.com/X8Im2gI.png
>
> However, the goal for the census blocks is to aggregate various
> quantitative variables from IHOs within 3 miles (hence the buffer) of any
> point on the edge. I can imagine some brute force looping methods using
> the gRelate function, or distance, but can't seem to nail it in one pass
> with over the way I might with QGIS or Arc. In the below case,
> IHOs.3mibuff.spdf is now a spatial poly data frame, as are the study
> blocks. When I try passing the first argument (counties or blocks) as just
> a geometry, I still get a matrix of named rows and some single aggregated
> number - but not all the aggregated columns in the data frame.
> test <-over(NC.studyblocks.shp at polygons, IHOs.3mibuff.spdf, fun=sum) ??
> test <-over(NC.studyblocks.shp, IHOs.3mibuff.spdf, fun=sum) ??
>
> #2 Issues with memory. I've read a few tip sets on generic size best
> practices that aren't spatial specific, but if anyone has any tips, I'd
> love to hear them. I seem to have had some success with gc() after large
> subsetting. In (probably miscoding) the analysis of gRelate of 200,000
> blocks and 2,000 buffer circles, I eat up 6gigs with ease.
>
> Best wishes, again,
> mike
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster,
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software: http://www.jstatsoft.org/
Computers & Geosciences: http://elsevier.com/locate/cageo/
Spatial Statistics Society http://www.spatialstatistics.info
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20150522/84439dbb/attachment.bin>
More information about the R-sig-Geo
mailing list