[R-sig-Geo] Spatial clustering gridded data with missing values (over water)

Roger Bivand Roger.Bivand at nhh.no
Fri Nov 13 14:37:32 CET 2015


On Tue, 10 Nov 2015, ACD wrote:

> I'm trying to cluster a spatial dataset, and use the cluster labels as an 
> input to a second process.  I've been using the `spdep` package in R.
>
> I've got gridded data at .5 degree lat/lon resolution.  There are 19 
> covariates in the example subset of it linked here: 
> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0
>
> The following shows that I can't calculate the minimum spanning tree -- a 
> necessary input into `skater` -- when the dataset includes areas undefined 
> because they are over water.
>
> How would one get around this?
>
>    system('wget 
> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0')
>     load('example_data?dl=0')
>
>         1> with(x, plot(lon,lat))

No, don't do this, you are obscuring the data. Do first str(x), then build 
the object so that you know what is going on (the object structure 
matches the data representation more closely):

library(sp)
coordinates(x) <- c("lon", "lat")
proj4string(x) <- CRS("+proj=longlat +datum=WGS84")
plot(x)

OK, it's gridded, and n is small, so build the neighbour object using 
polygons, queen neighbours:

gridded(x) <- TRUE
x1 <- as(x, "SpatialPolygonsDataFrame")
library(spdep)
bh.nb <- poly2nb(x1)
plot(x1)
plot(bh.nb, coordinates(x1), add=TRUE)


>         1> library(spdep)
>         1> bh.nb <- 
> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>         1> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>         Error in data[id.neigh, , drop = FALSE] : subscript out of bounds
>
>
> If I restrict the data to cut out the missing values, I have no problem:

>From there on, it is reasonably simple:

lcosts <- nbcosts(nb = bh.nb, data = as(x1, "data.frame"),
  method='euclidean')
nb.w <- nb2listw(bh.nb, lcosts, style="B")
mst.bh <- mstree(nb.w, 10)
res1 <- skater(mst.bh[,1:2], as(x1, "data.frame"), 5)
plot(res1, coordinates(x1), cex.circles=0.035, cex.lab=.7)

Hope this clarifies,

Roger


>
>     x = x[x$lon>-117,]
>     bh.nb <- 
> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>     lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>     nb.w <- nb2listw(bh.nb, lcosts, style="B")
>     mst.bh <- mstree(nb.w,10)
>     res1 <- skater(mst.bh[,1:2], x, 5)
>     plot(res1, cbind(x$lon,x$lat), cex.circles=0.035, cex.lab=.7)
>
>
> How do I get around this over-water problem?  I want to be able to cluster 
> the land surfaces, including islands and peninsulas.  I suppose that I want 
> islands to be linked to their nearest point of land.
>
> References appreciated as well as fixes to the specific problem with the 
> `spdep` interface.
>
> Thanks!
> Andrew
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list