[R-sig-Geo] R crash with readOGR

Roger Bivand Roger.Bivand at nhh.no
Sun Feb 14 20:57:43 CET 2010


On Fri, 12 Feb 2010, caspar hallmann wrote:

> Hi Roger,
>
> Thank you for your very quick response.
> This surely helps...

The analysis for the data set that Caspar kindly made available is as 
follows:

1. With a very large number of polygons for some (which unknown) versions 
of R and sp, the instantiation of the SpatialPolygonsDataFrame can begin 
to use increasing amounts of memory, which is not freed and may render a 
platform with too little RAM unusable. On Windows with default memory 
settings, this may terminate R as reported. The next releases of sp and 
rgdal should address this problem. I think that the cause lies in the base 
methods package, but the changes should avoid problems. Please contact me 
if anyone observes similar behaviour from sp 0.9-60 or subsequent and 
rgdal 0.6-25 or subsequent.

2. readOGR() now has a default useC=TRUE argument; however, time savings 
are small since Polygon, Polygons, and SpatialPolygons object creation is 
already handled in C from the sp package, so the extra layer in rgdal 
does not gain much. Attempts to rewrite readOGR in C will follow later.

3. Caspar's data set is a projected grid of raster cells with 327615 
"polygons", each with only 5 coordinates, that is 80 bytes (each feature 
in the shapefile takes just 150 bytes). Putting this into a Polygons 
object adds over 2800 bytes, which is effectively a worst case scenario. A 
grid cell then looks like:

> str(slot(fc, "polygons")[[1]])
Formal class 'Polygons' [package "sp"] with 5 slots
   ..@ Polygons :List of 1
   .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
   .. .. .. ..@ labpt  : num [1:2] -14383 569871
   .. .. .. ..@ area   : num 1003399
   .. .. .. ..@ hole   : logi FALSE
   .. .. .. ..@ ringDir: int 1
   .. .. .. ..@ coords : num [1:5, 1:2] -13841 -13927 -14925 -14839 -13841 
...
   ..@ plotOrder: int 1
   ..@ labpt    : num [1:2] -14383 569871
   ..@ ID       : chr "0"
   ..@ area     : num 1003399

where

> 2896 * 327615
[1] 948773040

so the SpatialPolygons object "weighs in" at 1GB for a <50MB shapefile.

With this size of object, one needs a lot of RAM, and probably a 64-bit 
platform. Reading it for me with 4GB pushed memory usage to just over 2GB 
(including all running processes). With useC=TRUE, reading completed in 
135 seconds, with useC=FALSE 153 seconds.

4. My feeling is that the main issue is the workflow, and choice of 
representation. The data are raster data with 1000m resolution, and 
instead of warping to the OSGB National Grid, they have been vectorised, 
and the vector squares quasi-projected (quasi- because the curved southern 
and northern borders are represented by straight lines).

If the data had been warped instead of vectorised, or if the subsequent 
analysis could be done in the input projection, the worst-case behaviour 
of the SpatialPolygons, Polygons, and Polygon class could be avoided. 
Indeed, using the R-Forge raster package, much finer resolution raster 
data could have been used.

I'll be submitting new sp and rgdal releases to CRAN. The rgdal release 
should also trigger the building of the Windows binary with the new GDAL 
1.7.1 release, with R and SAGA raster drivers.

Thanks to Caspar for helping analyse this problem.

Roger

>
> Perhaps however I should have emphasized: "without" any warning, the
> thing being that I 'suspected' this is a memory issue.
> What happened is that after submitting the code to R, and after
> waiting a few minutes, R just shuts down. No warnings, no errors,
> nothing.
> And this keeps coming up after clearing memory (other data) or
> restarting R altogether.
>
> What I want to do is retrieve a range of shapefiles, possibly
> differing in their CRS's, homogenizing the reference system,
> overlaying and finally creating a 'covariate' dataset for further
> geostatistical analysis. Since this will eventually be called in a
> batch mode for many spatial response variables, with each response
> having differing set of covariates (and thus layers and projections),
> I need to get this streamlined.
>
> Anyway, the function 'OGRSpatialRef' worked fine, and in conjunction
> with read.dbf from library 'foreign'  (to get the data) I'm pretty
> much sorted. So thanks again!
>
> all the best,
>
> Caspar
>
>
>
> On Fri, Feb 12, 2010 at 3:47 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>> On Fri, 12 Feb 2010, caspar hallmann wrote:
>>
>>> Dear All,
>>>
>>> Perhaps one can help me with the obvious, however I have been trying
>>> to import a shapefile using function readOGR as in:
>>> myshp<-readOGR("mydirectorypath", "shapefilename"), and without any
>>> warning (e.g. can't allocate memory size or something trivial) R
>>> crashes en shuts down. The shapefile itself does not seem too large
>>> though ( 42mb the .shp alone).
>>
>> OK. You are on Windows, where memory management is not as capable as on
>> Unix-based systems. If R say that it cannot allocate memory under Windows,
>> it has not "crashed", only responded to the inadequacy of the system. If you
>> read the R for Windows FAQ, you'll see some tricks for improving memory use.
>> A 42MB shapefile is very large, you have not said what you need to use it
>> for.
>>
>> The R lists posting guide does say when you can term anything a crash, this
>> is not one of those cases.
>>
>> There is no separate function for reading a *.prj file. The so-called WKT
>> spatial reference in the *.prj is read by the OGR driver, and converted to
>> PROJ.4.
>>
>> "OGRSpatialRef" <- function(dsn, layer) {
>>    .Call("ogrP4S", as.character(dsn), as.character(layer),
>>        PACKAGE="rgdal")
>> }
>>
>> is a function that will return that representation without trying to read
>> the features, I'll add it and one for GDAL rasters to the next release - the
>> GDAL information is already in the object returned by GDALinfo(), but noy
>> yet in that returned by ogrInfo().
>>
>> Hope this helps,
>>
>> Roger
>>
>>> I wonder if anyone has encountered the same problem, or beter, the
>>> solution.
>>> For now, I have resorted to importing the .dbf of the shapefile (works
>>> fine), then converting it to some preferred spatial format. However,
>>> since now I am required to manually set the CRS this pretty much
>>> prevents me from applying batch analysis on a range of shapesfile
>>> differing in their CRS's. Note that the .prj are present, and would
>>> like to take advantage of readOGR's capability of recognizing the CRS
>>> of each shapefile.
>>>
>>> Thanks in advance
>>>
>>> Caspar
>>>
>>> PS: BTW: Is there a function like ''readprj()'' around ?
>>> PS2: sessionInfo:
>>> R version 2.10.1 (2009-12-14)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252
>>>  LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
>>> [5] LC_TIME=Dutch_Netherlands.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] RSAGA_0.9-6     rgdal_0.6-24    maptools_0.7-29 lattice_0.18-3
>>> sp_0.9-57       foreign_0.8-39
>>>
>>> loaded via a namespace (and not attached):
>>> [1] grid_2.10.1  proj4_1.0-4  tools_2.10.1
>>>>
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Economic Geography Section, Department of Economics, Norwegian School of
>> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
>> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
>> e-mail: Roger.Bivand at nhh.no
>>
>>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list