[R-sig-Geo] stratified random sampling

Roger Bivand Roger.Bivand at nhh.no
Wed Feb 25 09:53:36 CET 2009


On Tue, 24 Feb 2009, Heuvelink, Gerard wrote:

> Dear list,
>
> The stratified random sampling problem that I submitted a few days ago 
> has already been solved, with the help of several of you, notably Edzer 
> Pebesma. Edzer came up with the following solution:

WARNING!! This is only conditionally correct. See below for analysis.

>
> library(sp)
> library(rgdal)
> nc1 <- readShapePoly(system.file("shapes/sids.shp",package="maptools")[1],
+ proj4string=CRS("+proj=longlat +datum=NAD27"))
> pts = do.call(rbind, sapply(slot(nc1, "polygons"), spsample, n=1,
+ type="random"))
> plot(nc1)
> points(pts, col='blue', pch=19, cex=1)
>
> As it happened, the do.call statement did not work in my case (Edzer and 
> Roger may look into why it does not work with all shapes)

It was because the shapefile was no good, with both self-intersecting and 
overlapping polygons. This left some values returned by spsample in the 
sapply call as NULL (correctly), and no rbind() method exists for 
SpatialPoints and NULL objects.

> and had to be replaced by:
>
> for (i in 1:length(slot(nc1, "polygons"))) {
>    pt = spsample(nc1[i,], n=1, type="random")
>    if (i == 1)
>        pts = pt
>    else
>        pts = rbind(pts, pt)
> }
>

This gets a result - there are other ways of doing it too, but it is not 
what it seems. Because the polygons are self-intersecting and overlapping, 
the point-in-polygon algorithm is not choosing points correctly (spsample 
for a ring generates more points than needed in the bounding box of the 
polygon, and chooses the number needed from those that fall within the 
polygon).

The input shapefile is a vectorised map of soil types from raster data, 
but unfortunately the software used to generate it is unknown, so we can't 
warn people off it. Assuming that only one soil type occurs in each raster 
cell, we can reproduce the case with meuse.grid:

library(sp)
data(meuse.grid)
coordinates(meuse.grid) <- c("x", "y")
gridded(meuse.grid) <- TRUE
meuse.grid$soil <- factor(meuse.grid$soil)
spplot(meuse.grid, "soil")
meuseSP <- as(meuse.grid, "SpatialPolygons")
ID <- as.character(meuse.grid$soil)
library(maptools)
meuseSP1 <- unionSpatialPolygons(meuseSP, ID)
plot(meuseSP1, col=1:3)
pts = do.call(rbind, sapply(slot(meuseSP1, "polygons"), spsample, n=1,
  type="random"))
points(pts, pch=3, col="white")

Doing the raster to vector conversion in R ought to resolve the underlying 
problem of the soil polygons being generated from the raster values in an 
inappropriate way.

> I am so happy!

Sorry, no comment! I have replied off-list too, but unfortunately bad 
shapefiles really do exist, and they cause lots of problems.

Roger

>
> Gerard
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list