[R-sig-Geo] Memory management with rasterToPolygons (raster and rgeos)

Roger Bivand Roger.Bivand at nhh.no
Thu Jan 5 21:22:57 CET 2012


On Thu, 5 Jan 2012, Lyndon Estes wrote:

> In case it's of interest (although it won't solve the memory problems
> mentioned) I wrote a function to call gdal from R to polygonize rasters. It
> was quite fast in the instances I used it.

Very interesting! Anyone want a nice weekend project of writing the R 
interface to GDALPolygonize in rgdal?

Roger

>
> Mine is tailored to a Mac (10.5.8), so if used elsewhere this line might
> cause problems:
>
> py.c <- "python2.5
> /Library/Frameworks/GDAL.framework/Programs/gdal_polygonize.py"
>
> runGdalPolygonize <- function(inraster, outshape, attname, gdalformat =
> "ESRI Shapefile") {
> # Runs gdal_polygonize.py to create polygons from raster (with limited
> options at present)
> # Args:
> #   inraster: The name of a raster object referring to a permanent,
> GDAL-readable raster
> #   outshape: The name of the desired output polygon file
> #   attname: The name for the column in the output attribute table
> #   gdalformat: Defaults to ESRI shapefile at present
> # Returns:
> #   Polygon file, read back into an R object
> # Notes:
> #   This needs to be run within the directory containing the raster file
>
>  py.c <- "python2.5
> /Library/Frameworks/GDAL.framework/Programs/gdal_polygonize.py"
>  rast.nm <- unlist(strsplit(inraster at file@name,
> "/"))[length(unlist(strsplit(inraster at file@name, "/")))]
>  full.c <- paste(py.c, rast.nm, "-f", paste("'", gdalformat, "'", sep =
> ""),
>                  paste(outshape, ".shp", sep = ""), outshape, attname)
>  system(full.c)
>  shp <- readOGR(dsn = paste(outshape, ".shp", sep = ""), layer = outshape)
>  return(shp)
> }
>
> Cheers, Lyndon
>
> On Thu, Jan 5, 2012 at 1:49 PM, pgalpern <pgalpern at gmail.com> wrote:
>
>> Agreed these are very large objects.  I'll look at tiling as a general
>> solution to this problem.
>>
>> For others facing the same challenge it is worth noting that I have been
>> successful in running rasterToPolygons(x, dissolve=TRUE) on rasters up to
>> 800000 cells producing an object containing approx  1500 SpatialPolygons
>> under 64bit Windows, by ensuring there is at least 7GB of overhead memory.
>>  Run time was reasonable.  R instance must be terminated following function
>> call to free up the memory.
>>
>>
>>
>> On 05/01/2012 12:19 PM, Roger Bivand wrote:
>>
>>> On Thu, 5 Jan 2012, pgalpern wrote:
>>>
>>>  I did some further research into my own question when I twigged to the
>>>> idea that this might be a memory leak with the GEOS library.
>>>>
>>>> It seems likely that is, and has been documented in this forum this
>>>> October past: https://mailman.stat.ethz.ch/**pipermail/r-sig-geo/2011-**
>>>> October/013289.html<https://mailman.stat.ethz.ch/pipermail/r-sig-geo/2011-October/013289.html>
>>>>
>>>> As of October there didn't appear to be any real resolution to the
>>>> problem, except - perhaps - to run rgeos under Linux.
>>>>
>>>> Is this the status quo?
>>>>
>>>
>>> The issue with rgeos/GEOS is unresolved, and has led to at least two
>>> releases in the mean time. Using Linux does not help. It may be possible to
>>> run with dissolve=FALSE, and step through chunks of feature classes, in
>>> separate R scripts. However, it isn't just an rgeos issue, as:
>>>
>>> library(raster)
>>> r <- raster(nrow=500, ncol=1000)
>>> r[] <- rpois(ncell(r), lambda=70)
>>> pol <- rasterToPolygons(r, dissolve=FALSE)
>>>
>>> gives me:
>>>
>>>  object.size(r)
>>>>
>>> 4011736 bytes
>>>
>>>> object.size(pol)
>>>>
>>> 1458003216 bytes
>>>
>>> so pol is 1.5G here, with 73 categories (I forgot to set.seed()). Only
>>> raster and sp are present here. The arithmetic is:
>>>
>>>  object.size(slot(pol, "polygons")[[1]])
>>>>
>>> 2896 bytes
>>>
>>>> dim(pol)
>>>>
>>> [1] 500000      1
>>>
>>>> object.size(slot(pol, "polygons")[[1]])*dim(pol)[1]
>>>>
>>> 1.448e+09 bytes
>>>
>>> so the input SpatialPolygons object is already large, and GEOS needs a
>>> separate copy in its format, plus working copies.
>>>
>>> Could you work on tiles of the raster, then join those?
>>>
>>> We're still hoping that someone will help with bug-fixing in rgeos, but
>>> this is also a data representation question, I think.
>>>
>>> Hope this helps,
>>>
>>> Roger
>>>
>>>
>>>> Thanks,
>>>> Paul
>>>>
>>>> On 04/01/2012 8:57 PM, pgalpern wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> Not sure if this is the best place to ask what may ultimately be an
>>>>> rgeos question.
>>>>>
>>>>> I am running the latest versions of the raster and rgeos packages under
>>>>> 64bit 2.14.1 on Windows 2008R2 Server with 12GB RAM and having some
>>>>> challenges with memory.
>>>>>
>>>>> I am turning rasters (approx 500 x 1000 cells) into 1500
>>>>> SpatialPolygons, representing each of the feature classes. It works as it
>>>>> should, using rasterToPolygons(x, dissolve=T) but memory overhead is sky
>>>>> high.
>>>>>
>>>>> For example, a single instance of this function quickly consumes 2-3 GB
>>>>> and would probably consume more if other instances were not also running
>>>>> simultaneously.   As a result disk swapping occurs which slows everything
>>>>> down.  Interestingly, the input raster and output SpatialPolygons objects
>>>>> are only megabytes in size.  Running this under 32bit R doesn't seem to
>>>>> help and occasionally results in memory allocation failures.
>>>>>
>>>>> Finally, deleting the raster and polygons objects when the function is
>>>>> complete and running gc() does not seem to release the memory.  Instead the
>>>>> entire R instance needs to be terminated.
>>>>>
>>>>> Can anyone tell me if this is expected behaviour ... or perhaps suggest
>>>>> a more memory efficient approach.
>>>>>
>>>>> Thank you,
>>>>> Paul
>>>>>
>>>>>
>>>>
>>>>
>>>
>> --
>> Paul Galpern, PhD Candidate
>> Natural Resources Institute
>> 70 Dysart Road
>> University of Manitoba
>> Winnipeg, Manitoba, Canada R3T 2M6
>> http://borealscape.ca
>>
>> ______________________________**_________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list