[R-sig-Geo] best practice for reading large shapefiles?

Chris Reudenbach reudenbach at uni-marburg.de
Wed Apr 27 00:24:38 CEST 2016


Even if it might be in this list OT, IMHO R is not the best tool for 
dealing with this amount of vector data. Actually I agree completely 
with Roger's remarks and corresponding to the "competent platform" you 
also may think about using software for big data...

As Roger already has clarified: The recommendation what might be best 
depends highly  on your questions and issues or on the type of analysis 
you need to run and cannot be answered straightforward.

I think Edzer can clarify up to which size sp object are still "usable", 
following my experience  i would guess something like 500K polygons 1M 
lines and up to 5M points but it is highly dependent on the number of 
attributes. So you are far beyond this.

If you want to deal with this amount of spatial vector data using R, it 
is highly reasonable to have a look at one of the mature GIS packages 
like GRASS or QGIS. You can use them via their APIs.
Nevertheless you easily can put it in postgres/postgis and perform all 
operations/analysis using the spatial capabilities and build in 
functions of postgis if you are an experienced PostGis user.


Am 26.04.2016 um 22:33 schrieb Vinh Nguyen:
> On Tue, Apr 26, 2016 at 1:12 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>> On Tue, 26 Apr 2016, Vinh Nguyen wrote:
>>> Would loading the shapefile into postgresql first and then use readOGR
>>> to read from postgres be a recommended approach?  That is, would the
>>> bottleneck still occur?  Thank you.
>> Most likely, as both use the respective OGR drivers. With data this size,
>> you'll need a competent platform (probably Linux, say 128GB RAM) as
>> everything is in memory. I find it hard to grasp what the point of doing
>> this might be - visualization won't work as none of the considerable detail
>> certainly in these files will be visible. Can you put the lot into an SQLite
>> file and access the attributes as SQL queries? I don't see the analysis or
>> statistics here.
> - I can't tell from your response whether you are recommending PostGIS
> is a recommended approach or not.  Could you clarify?
> - I am working on a Windows server with 64gb ram, so not too weak,
> especially for some files that are a few gb in size.  Again, not sure
> if the job just halted or it's still running, but just rather slow.
> I've killed it for now as the memory usage still has not grown after a
> few hours.
> - Yes, the shapes are quite granular and many in quantity.  The use
> case was not to visualize them all at once.  Wanted a master file so
> that when I get a data set of interest, I could intersect the two and
> then subset the areas of interest (eg, within a state or county).
> Then visualize/analyze from there.  The master shapefile was meant to
> make it easy (reading in one file) as opposed to deciding which
> shapefile to read in depending on the project.
> - I just looked back at the 30 PLSS zip files, and they provide shapes
> for 3 levels of granularity.  I went with the smallest.  I just
> realized that the mid-size one would be sufficient for now, which
> results in dbf=138mb and shp=501mb.  Attempting to read this in now (~
> 30 minutes), which I assume will read in fine after some time.  Will
> respond to this thread if this is not the case.
> Thanks for responding Roger.
> -- Vinh
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Dr Christoph Reudenbach, Philipps-University of Marburg, Faculty of 
Geography, GIS and Environmental Modeling, Deutschhausstr. 10, D-35032 
Marburg, fon: ++49.(0)6421.2824296, fax: ++49.(0)6421.2828950, web: 
gis-ma.org, giswerk.org, moc.environmentalinformatics-marburg.de

More information about the R-sig-Geo mailing list