[R-sig-Geo] poly %over% poly in rgeos; memory tips
Edzer Pebesma
edzer.pebesma at uni-muenster.de
Sun May 24 14:32:16 CEST 2015
In
http://stackoverflow.com/questions/30409001/how-does-spatial-polygon-over-polygon-work-to-when-aggregating-values-in-r
OP follows this up with a reproducible example.
It seems to be a bug in rgeos::over that was fixed a month ago, but is
still in the CRAN version.
On 05/22/2015 05:50 PM, Edzer Pebesma wrote:
> Mike, you may want to read two recent threads:
>
> https://stat.ethz.ch/pipermail/r-sig-geo/2015-April/022624.html
> https://stat.ethz.ch/pipermail/r-sig-geo/2015-April/022663.html
>
> When you try to aggregate, instead of using over(), have you used
> sp::aggregate directly, for instance with option areaWeighted = TRUE?
>
> If memory is a bottle neck, I'd suggest to either buy more of it, or
> choose a workflow that needs less of it, like PostGIS with SQL.
>
> On 05/22/2015 04:41 PM, Mike wrote:
>> Hi all, big fan of the listserv, learning a lot.
>>
>> I've a few hopefully questions about the sp and rgeos packages.
>>
>> #1 I've had success using %over% in the sp packages, but can't seem to find
>> the help for poly-poly (spatial poly data frame or spatial poly over the
>> same), implemented in rgeos. Where is that explained? I can't seem to get
>> it to return an aggregation in the way poly over points does, where I can
>> use fun=sum to aggregate a number of data frame columns at once. I've read
>> a number of 3rd party explanations and poked in a few books, but haven't
>> found much on poly poly data aggregation with over.
>>
>> #2 for large geometries, any suggestions of way to be more efficient in
>> general in spatial work? I love the gRelate function (see below), but for
>> attempting to aggregate ~2000 3mile buffers over ~200,000 NC census blocks,
>> gRelate seems to fail more often than not. (I really only need OVER in
>> this case, but have tried work arounds since, in #1, I can't get over to
>> work poly-poly the way I need).
>>
>> Thanks! More detail and specific case below.
>>
>> Best,
>> Mike
>> UNC Epidemiology
>>
>> ====
>>
>> Context:
>>
>> I'm working on porting an environmental epidemiology project done in Arc
>> and STATA to all R. Relevant context - we have point exposures (in this
>> case, industrial hog operations) with 3 mile buffers to represent their
>> exposure radius (red dots and pink buffers in the map, from gBuffer after
>> spTransform). Exclusion criteria (blue shapes) are (1) counties that only
>> border IHOs-less counties (grey) and the five most populous cities (where
>> IHOs are never cited) - that's done through the fantastic gRelate function
>> (see below for code).
>>
>> IHOblob = gUnionCascaded(hasIHOcounties)
>> #plot(IHOblob) #can plot this to check it out.
>> touch.v = gRelate(noIHOcounties, IHOblob, byid = T)
>> str(touch.v)
>> counties.not.touching<-(touch.v %in% c("FF2FF1212")) #DE-91M is super cool.
>> notouchIHOcounties = noIHOcounties[counties.not.touching,]
>> plot(notouchIHOcounties, co="light blue", lwd=0.6, add=T)
>>
>> See image for a quick visual of what's going on:
>> http://i.imgur.com/4m6CgW8.png
>>
>> The analysis, however, is happening at the census block level. Here's
>> where I have two challenges I'm hoping to get some help with.
>>
>> #1 poly over poly dataframe aggregation: I understand that the %over%
>> function for poly-poly is implemented in rgeos instead of sp (see table
>> below). Sp's over help clarifies how arguments need to be aligned or
>> stripped of data frames to get what you want. So, for instance, as a test
>> case, here's successful code to aggregate the *point* data on IHOs to their
>> respective counties (if there's a better way to do this, Please let me
>> know):
>> NC.counties.shp at data = data.frame(c(NC.counties.shp at data,
>> over(NC.counties.shp, IHOs.spdf, fun=sum)))
>>
>> Over implementations: http://i.imgur.com/X8Im2gI.png
>>
>> However, the goal for the census blocks is to aggregate various
>> quantitative variables from IHOs within 3 miles (hence the buffer) of any
>> point on the edge. I can imagine some brute force looping methods using
>> the gRelate function, or distance, but can't seem to nail it in one pass
>> with over the way I might with QGIS or Arc. In the below case,
>> IHOs.3mibuff.spdf is now a spatial poly data frame, as are the study
>> blocks. When I try passing the first argument (counties or blocks) as just
>> a geometry, I still get a matrix of named rows and some single aggregated
>> number - but not all the aggregated columns in the data frame.
>> test <-over(NC.studyblocks.shp at polygons, IHOs.3mibuff.spdf, fun=sum) ??
>> test <-over(NC.studyblocks.shp, IHOs.3mibuff.spdf, fun=sum) ??
>>
>> #2 Issues with memory. I've read a few tip sets on generic size best
>> practices that aren't spatial specific, but if anyone has any tips, I'd
>> love to hear them. I seem to have had some success with gc() after large
>> subsetting. In (probably miscoding) the analysis of gRelate of 200,000
>> blocks and 2,000 buffer circles, I eat up 6gigs with ease.
>>
>> Best wishes, again,
>> mike
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster,
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software: http://www.jstatsoft.org/
Computers & Geosciences: http://elsevier.com/locate/cageo/
Spatial Statistics Society http://www.spatialstatistics.info
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20150524/159a2da9/attachment.bin>
More information about the R-sig-Geo
mailing list