[R-sig-Geo] stratified random sampling
Roger Bivand
Roger.Bivand at nhh.no
Thu Feb 26 17:57:59 CET 2009
On Wed, 25 Feb 2009, Roger Bivand wrote:
> On Tue, 24 Feb 2009, Heuvelink, Gerard wrote:
>
>> Dear list,
>>
>> The stratified random sampling problem that I submitted a few days ago has
>> already been solved, with the help of several of you, notably Edzer
>> Pebesma. Edzer came up with the following solution:
>
> WARNING!! This is only conditionally correct. See below for analysis.
Further WARNING!! I have now checked everything as carefully as possible.
In fact, vectorising the underlying gridded data in R (using sp and
maptools) led to the same problem.
So it was wrong of me to blame the input shapefile, which although very
complex, was topologically correct.
The causes of the problem were firstly that when only one point was being
sampled in each Polygons object, the use of round() to divide n among the
member Polygon objects by areal proportion was setting n=0 in some cases
for all Polygon objects. So round() has been replaced by ceiling() in the
next release of sp - if more than n points are chosen, they are sampled
back to n.
The second cause was an interaction between the gridded shapes of many
Polygon objects (they are often single raster cells), their bounding
boxes, which are of course identical with the polygon ring, and a
heuristic used to try to detect when a Polygon object is wrongly declared
as not being a hole. The heuristic was turning raster cells touching at
one corner vertex into holes, so not sampling them. This has also been
corrected on CVS and will be in the next sp release.
We'd be grateful for users reporting on spsample() behaving unexpectedly
with Polygon, Polygons, and/or SpatialPolygons objects, especially with
test data (as in this case, offline).
Roger
>
>>
>> library(sp)
>> library(rgdal)
>> nc1 <- readShapePoly(system.file("shapes/sids.shp",package="maptools")[1],
> + proj4string=CRS("+proj=longlat +datum=NAD27"))
>> pts = do.call(rbind, sapply(slot(nc1, "polygons"), spsample, n=1,
> + type="random"))
>> plot(nc1)
>> points(pts, col='blue', pch=19, cex=1)
>>
>> As it happened, the do.call statement did not work in my case (Edzer and
>> Roger may look into why it does not work with all shapes)
>
> It was because the shapefile was no good, with both self-intersecting and
> overlapping polygons. This left some values returned by spsample in the
> sapply call as NULL (correctly), and no rbind() method exists for
> SpatialPoints and NULL objects.
>
>> and had to be replaced by:
>>
>> for (i in 1:length(slot(nc1, "polygons"))) {
>> pt = spsample(nc1[i,], n=1, type="random")
>> if (i == 1)
>> pts = pt
>> else
>> pts = rbind(pts, pt)
>> }
>>
>
> This gets a result - there are other ways of doing it too, but it is not what
> it seems. Because the polygons are self-intersecting and overlapping, the
> point-in-polygon algorithm is not choosing points correctly (spsample for a
> ring generates more points than needed in the bounding box of the polygon,
> and chooses the number needed from those that fall within the polygon).
>
> The input shapefile is a vectorised map of soil types from raster data, but
> unfortunately the software used to generate it is unknown, so we can't warn
> people off it. Assuming that only one soil type occurs in each raster cell,
> we can reproduce the case with meuse.grid:
>
> library(sp)
> data(meuse.grid)
> coordinates(meuse.grid) <- c("x", "y")
> gridded(meuse.grid) <- TRUE
> meuse.grid$soil <- factor(meuse.grid$soil)
> spplot(meuse.grid, "soil")
> meuseSP <- as(meuse.grid, "SpatialPolygons")
> ID <- as.character(meuse.grid$soil)
> library(maptools)
> meuseSP1 <- unionSpatialPolygons(meuseSP, ID)
> plot(meuseSP1, col=1:3)
> pts = do.call(rbind, sapply(slot(meuseSP1, "polygons"), spsample, n=1,
> type="random"))
> points(pts, pch=3, col="white")
>
> Doing the raster to vector conversion in R ought to resolve the underlying
> problem of the soil polygons being generated from the raster values in an
> inappropriate way.
>
>> I am so happy!
>
> Sorry, no comment! I have replied off-list too, but unfortunately bad
> shapefiles really do exist, and they cause lots of problems.
>
> Roger
>
>>
>> Gerard
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list