[R-sig-Geo] Spatial clustering gridded data with missing values (over water)
Roger Bivand
Roger.Bivand at nhh.no
Fri Nov 13 14:37:32 CET 2015
On Tue, 10 Nov 2015, ACD wrote:
> I'm trying to cluster a spatial dataset, and use the cluster labels as an
> input to a second process. I've been using the `spdep` package in R.
>
> I've got gridded data at .5 degree lat/lon resolution. There are 19
> covariates in the example subset of it linked here:
> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0
>
> The following shows that I can't calculate the minimum spanning tree -- a
> necessary input into `skater` -- when the dataset includes areas undefined
> because they are over water.
>
> How would one get around this?
>
> system('wget
> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0')
> load('example_data?dl=0')
>
> 1> with(x, plot(lon,lat))
No, don't do this, you are obscuring the data. Do first str(x), then build
the object so that you know what is going on (the object structure
matches the data representation more closely):
library(sp)
coordinates(x) <- c("lon", "lat")
proj4string(x) <- CRS("+proj=longlat +datum=WGS84")
plot(x)
OK, it's gridded, and n is small, so build the neighbour object using
polygons, queen neighbours:
gridded(x) <- TRUE
x1 <- as(x, "SpatialPolygonsDataFrame")
library(spdep)
bh.nb <- poly2nb(x1)
plot(x1)
plot(bh.nb, coordinates(x1), add=TRUE)
> 1> library(spdep)
> 1> bh.nb <-
> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
> 1> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
> Error in data[id.neigh, , drop = FALSE] : subscript out of bounds
>
>
> If I restrict the data to cut out the missing values, I have no problem:
>From there on, it is reasonably simple:
lcosts <- nbcosts(nb = bh.nb, data = as(x1, "data.frame"),
method='euclidean')
nb.w <- nb2listw(bh.nb, lcosts, style="B")
mst.bh <- mstree(nb.w, 10)
res1 <- skater(mst.bh[,1:2], as(x1, "data.frame"), 5)
plot(res1, coordinates(x1), cex.circles=0.035, cex.lab=.7)
Hope this clarifies,
Roger
>
> x = x[x$lon>-117,]
> bh.nb <-
> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
> nb.w <- nb2listw(bh.nb, lcosts, style="B")
> mst.bh <- mstree(nb.w,10)
> res1 <- skater(mst.bh[,1:2], x, 5)
> plot(res1, cbind(x$lon,x$lat), cex.circles=0.035, cex.lab=.7)
>
>
> How do I get around this over-water problem? I want to be able to cluster
> the land surfaces, including islands and peninsulas. I suppose that I want
> islands to be linked to their nearest point of land.
>
> References appreciated as well as fixes to the specific problem with the
> `spdep` interface.
>
> Thanks!
> Andrew
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list