[R-sig-Geo] stratified random sampling

Roger Bivand Roger.Bivand at nhh.no
Thu Feb 26 17:57:59 CET 2009


On Wed, 25 Feb 2009, Roger Bivand wrote:

> On Tue, 24 Feb 2009, Heuvelink, Gerard wrote:
>
>> Dear list,
>> 
>> The stratified random sampling problem that I submitted a few days ago has 
>> already been solved, with the help of several of you, notably Edzer 
>> Pebesma. Edzer came up with the following solution:
>
> WARNING!! This is only conditionally correct. See below for analysis.

Further WARNING!! I have now checked everything as carefully as possible. 
In fact, vectorising the underlying gridded data in R (using sp and 
maptools) led to the same problem.

So it was wrong of me to blame the input shapefile, which although very 
complex, was topologically correct.

The causes of the problem were firstly that when only one point was being 
sampled in each Polygons object, the use of round() to divide n among the 
member Polygon objects by areal proportion was setting n=0 in some cases 
for all Polygon objects. So round() has been replaced by ceiling() in the 
next release of sp - if more than n points are chosen, they are sampled 
back to n.

The second cause was an interaction between the gridded shapes of many 
Polygon objects (they are often single raster cells), their bounding 
boxes, which are of course identical with the polygon ring, and a 
heuristic used to try to detect when a Polygon object is wrongly declared 
as not being a hole. The heuristic was turning raster cells touching at 
one corner vertex into holes, so not sampling them. This has also been 
corrected on CVS and will be in the next sp release.

We'd be grateful for users reporting on spsample() behaving unexpectedly 
with Polygon, Polygons, and/or SpatialPolygons objects, especially with 
test data (as in this case, offline).

Roger


>
>> 
>> library(sp)
>> library(rgdal)
>> nc1 <- readShapePoly(system.file("shapes/sids.shp",package="maptools")[1],
> + proj4string=CRS("+proj=longlat +datum=NAD27"))
>> pts = do.call(rbind, sapply(slot(nc1, "polygons"), spsample, n=1,
> + type="random"))
>> plot(nc1)
>> points(pts, col='blue', pch=19, cex=1)
>> 
>> As it happened, the do.call statement did not work in my case (Edzer and 
>> Roger may look into why it does not work with all shapes)
>
> It was because the shapefile was no good, with both self-intersecting and 
> overlapping polygons. This left some values returned by spsample in the 
> sapply call as NULL (correctly), and no rbind() method exists for 
> SpatialPoints and NULL objects.
>
>> and had to be replaced by:
>> 
>> for (i in 1:length(slot(nc1, "polygons"))) {
>>    pt = spsample(nc1[i,], n=1, type="random")
>>    if (i == 1)
>>        pts = pt
>>    else
>>        pts = rbind(pts, pt)
>> }
>> 
>
> This gets a result - there are other ways of doing it too, but it is not what 
> it seems. Because the polygons are self-intersecting and overlapping, the 
> point-in-polygon algorithm is not choosing points correctly (spsample for a 
> ring generates more points than needed in the bounding box of the polygon, 
> and chooses the number needed from those that fall within the polygon).
>
> The input shapefile is a vectorised map of soil types from raster data, but 
> unfortunately the software used to generate it is unknown, so we can't warn 
> people off it. Assuming that only one soil type occurs in each raster cell, 
> we can reproduce the case with meuse.grid:
>
> library(sp)
> data(meuse.grid)
> coordinates(meuse.grid) <- c("x", "y")
> gridded(meuse.grid) <- TRUE
> meuse.grid$soil <- factor(meuse.grid$soil)
> spplot(meuse.grid, "soil")
> meuseSP <- as(meuse.grid, "SpatialPolygons")
> ID <- as.character(meuse.grid$soil)
> library(maptools)
> meuseSP1 <- unionSpatialPolygons(meuseSP, ID)
> plot(meuseSP1, col=1:3)
> pts = do.call(rbind, sapply(slot(meuseSP1, "polygons"), spsample, n=1,
> type="random"))
> points(pts, pch=3, col="white")
>
> Doing the raster to vector conversion in R ought to resolve the underlying 
> problem of the soil polygons being generated from the raster values in an 
> inappropriate way.
>
>> I am so happy!
>
> Sorry, no comment! I have replied off-list too, but unfortunately bad 
> shapefiles really do exist, and they cause lots of problems.
>
> Roger
>
>> 
>> Gerard
>> 
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>> 
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list