[R-sig-Geo] Spatial clustering gridded data with missing values (over water)

ACD andrewcd at gmail.com
Tue Nov 10 16:13:00 CET 2015


I'm trying to cluster a spatial dataset, and use the cluster labels as 
an input to a second process.  I've been using the `spdep` package in R.

I've got gridded data at .5 degree lat/lon resolution.  There are 19 
covariates in the example subset of it linked here: 
https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0

The following shows that I can't calculate the minimum spanning tree -- 
a necessary input into `skater` -- when the dataset includes areas 
undefined because they are over water.

How would one get around this?

     system('wget 
https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0')
     load('example_data?dl=0')

         1> with(x, plot(lon,lat))
         1> library(spdep)
         1> bh.nb <- 
cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
         1> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
         Error in data[id.neigh, , drop = FALSE] : subscript out of bounds


If I restrict the data to cut out the missing values, I have no problem:

     x = x[x$lon>-117,]
     bh.nb <- 
cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
     lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
     nb.w <- nb2listw(bh.nb, lcosts, style="B")
     mst.bh <- mstree(nb.w,10)
     res1 <- skater(mst.bh[,1:2], x, 5)
     plot(res1, cbind(x$lon,x$lat), cex.circles=0.035, cex.lab=.7)


How do I get around this over-water problem?  I want to be able to 
cluster the land surfaces, including islands and peninsulas.  I suppose 
that I want islands to be linked to their nearest point of land.

References appreciated as well as fixes to the specific problem with the 
`spdep` interface.

Thanks!
Andrew



More information about the R-sig-Geo mailing list