[R-sig-Geo] Spatial clustering gridded data with missing values (over water)

Roger Bivand Roger.Bivand at nhh.no
Mon Nov 23 19:27:33 CET 2015


On Mon, 23 Nov 2015, ACD wrote:

> Dear Roger,
>
> Many thanks for this reply -- I see that the problem goes away when 
> representing the data as gridcell polygons rather than as the gridpoints 
> which they are.
>
> This works well on the small data subset that I uploaded for my minimal 
> working example, but the algorithm is quite slow on the full dataset (>3000 
> gridcells, with 19+ covariates).  Is there anything inherently faster or 
> slower about the skater algorithm run on polygons vs the algo run on gridded 
> data?

No, nothing faster, and note that the important conversion is via 
SpatialPixels (gridded() <- TRUE), which avoids the no-data grid cells in 
your example. All these methods work on the underlying neighbour graph, 
which for skater must be a single graph with no detached subgraphs. You 
may use the facilities provided in the skater workflow for parallel 
processing, which may help.

Roger

>
> Thanks,
> Andrew
>
> On 11/13/2015 08:37 AM, Roger Bivand wrote:
>>  On Tue, 10 Nov 2015, ACD wrote:
>> 
>> >  I'm trying to cluster a spatial dataset, and use the cluster labels as 
>> >  an input to a second process. I've been using the `spdep` package in R.
>> > 
>> >  I've got gridded data at .5 degree lat/lon resolution.  There are 19 
>> >  covariates in the example subset of it linked here: 
>> > https: //www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0
>> > 
>> >  The following shows that I can't calculate the minimum spanning tree -- 
>> >  a necessary input into `skater` -- when the dataset includes areas 
>> >  undefined because they are over water.
>> > 
>> >  How would one get around this?
>> > 
>> >     system('wget 
>> > https: //www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0')
>> >      load('example_data?dl=0')
>> > 
>> >          1> with(x, plot(lon,lat))
>>
>>  No, don't do this, you are obscuring the data. Do first str(x), then build
>>  the object so that you know what is going on (the object structure matches
>>  the data representation more closely):
>>
>>  library(sp)
>>  coordinates(x) <- c("lon", "lat")
>>  proj4string(x) <- CRS("+proj=longlat +datum=WGS84")
>>  plot(x)
>>
>>  OK, it's gridded, and n is small, so build the neighbour object using
>>  polygons, queen neighbours:
>>
>>  gridded(x) <- TRUE
>>  x1 <- as(x, "SpatialPolygonsDataFrame")
>>  library(spdep)
>>  bh.nb <- poly2nb(x1)
>>  plot(x1)
>>  plot(bh.nb, coordinates(x1), add=TRUE)
>> 
>> 
>> >          1> library(spdep)
>> >          1> bh.nb <- 
>> >  cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>> >          1> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>> >          Error in data[id.neigh, , drop = FALSE] : subscript out of 
>> >  bounds
>> > 
>> > 
>> >  If I restrict the data to cut out the missing values, I have no problem:
>>
>>  From there on, it is reasonably simple:
>>
>>  lcosts <- nbcosts(nb = bh.nb, data = as(x1, "data.frame"),
>>  method='euclidean')
>>  nb.w <- nb2listw(bh.nb, lcosts, style="B")
>>  mst.bh <- mstree(nb.w, 10)
>>  res1 <- skater(mst.bh[,1:2], as(x1, "data.frame"), 5)
>>  plot(res1, coordinates(x1), cex.circles=0.035, cex.lab=.7)
>>
>>  Hope this clarifies,
>>
>>  Roger
>> 
>> 
>> > 
>> >      x = x[x$lon>-117,]
>> >      bh.nb <- 
>> >  cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>> >      lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>> >      nb.w <- nb2listw(bh.nb, lcosts, style="B")
>> >      mst.bh <- mstree(nb.w,10)
>> >      res1 <- skater(mst.bh[,1:2], x, 5)
>> >      plot(res1, cbind(x$lon,x$lat), cex.circles=0.035, cex.lab=.7)
>> > 
>> > 
>> >  How do I get around this over-water problem?  I want to be able to 
>> >  cluster the land surfaces, including islands and peninsulas. I suppose 
>> >  that I want islands to be linked to their nearest point of land.
>> > 
>> >  References appreciated as well as fixes to the specific problem with the 
>> >  `spdep` interface.
>> > 
>> >  Thanks!
>> >  Andrew
>> > 
>> >  _______________________________________________
>> >  R-sig-Geo mailing list
>> >  R-sig-Geo at r-project.org
>> > https: //stat.ethz.ch/mailman/listinfo/r-sig-geo
>> > 
>> 
>
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list