[R-sig-Geo] spsample taking excessive memory

Roger Bivand Roger.Bivand at nhh.no
Tue May 3 19:59:52 CEST 2011


On Sat, 30 Apr 2011, Dylan Beaudette wrote:

> On Friday, April 29, 2011, Ned Horning wrote:
>> Hi -
>>
>> I am trying to select 1000 random samples from each set of ESRI
>> Shapefile polygons with the same attribute value (attName). This has
>> worked well in the past with other shapefiles. The file I'm using now
>> has 69 relatively simple polygons with 9 unique values for the attribute
>> I'm using to group the polygons. When I run spsample it takes several
>> minutes to collect the samples for some of the unique attribute values
>> and it is using well over 12GB of memory. By the time it start
>> collecting samples from the last set of polygons my swap space is nearly
>> exhausted and the spsample process goes on for hours until I kill it. Do
>> these symptoms sound familiar to anyone? Any ideas about what the
>> problem might be? I pasted my code below.
>>
>> Thanks in advance for any help.
>>
>> Ned
>
> Hi,
>
> Not sure if this is the problem, but it could be that dynamically appending to
> the object 'xy' is causing some of the memory consumption / slowness.
>
> I would suggest iterating over collections of polygons, and saving the results
> to a pre-allocated list object. Then, rbind the elements of the list back
> together at the end.

Hi,

The polygons are very small, spread out over a very large area, so 
spsample tries to generate many points for the bounding box of the object, 
rejecting most of them. As Dylan suggests, you can get a long way by 
stepping only to the polygons you need:

for (x in 1:length(uniqueAtt)) {
   class_data<- vec[vec[[attName]]==uniqueAtt[x],]
   areas <- sapply(slot(class_data, "polygons"), slot, "area")
   nsamps <- ceiling(numsamps*(areas/sum(areas)))
   for (i in 1:dim(class_data)[1]) {
     XY <- spsample(class_data[i,], type="random", n=nsamps[i])
# step Polygons object by Polygons object, dividing the numsamps by area
     if (i == 1) cpts <- XY
     else cpts <- rbind(cpts, XY)
   }
# maybe need to modify the number of points to match numsamps exactly.
   classpts <- cpts
   if (x == 1) {
     xy<- classpts
   } else {
     xy<- rbind(xy, classpts)
   }
}

The spsample() method for SpatialPolygons does really expect them to be 
close to each other, and to fill the object bounding box at least in some 
large part.

Hope this helps,

Roger

>
> Cheers,
> Dylan
>
>
>> --
>> shapefile <- '/home/nedhorning/AMNH/Galapagos/quickbird_train_4.shp'
>> numsamps <- 1000
>> attName <- 'type_id'
>>
>> vec <- readShapePoly(shapefile)
>> #
>> # Create vector of unique land cover attribute values
>> allAtt <- slot(vec, "data")
>> tabAtt <-table(allAtt[[attName]])
>> uniqueAtt <-as.numeric(names(tabAtt))
>> # Create input data from a Shapefile with training data and the image
>> for (x in 1:length(uniqueAtt)) {
>>    # Create a vector of all date for attName=x
>>    class_data <- vec[vec[[attName]]==uniqueAtt[x],]
>>    # Get random points for the class
>>    classpts <- spsample(class_data, type="random", n=numsamps)
>>    #Append data for all but the first run
>>    if (x == 1) {
>>      xy <- classpts
>>    } else {
>>      xy <- rbind(xy, classpts)
>>    }
>> }
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list