[R-sig-Geo] Spatial clustering gridded data with missing values (over water)

ACD andrewcd at gmail.com
Mon Nov 23 19:21:22 CET 2015


Dear Roger,

Many thanks for this reply -- I see that the problem goes away when 
representing the data as gridcell polygons rather than as the gridpoints 
which they are.

This works well on the small data subset that I uploaded for my minimal 
working example, but the algorithm is quite slow on the full dataset 
(>3000 gridcells, with 19+ covariates).  Is there anything inherently 
faster or slower about the skater algorithm run on polygons vs the algo 
run on gridded data?

Thanks,
Andrew

On 11/13/2015 08:37 AM, Roger Bivand wrote:
> On Tue, 10 Nov 2015, ACD wrote:
>
>> I'm trying to cluster a spatial dataset, and use the cluster labels 
>> as an input to a second process. I've been using the `spdep` package 
>> in R.
>>
>> I've got gridded data at .5 degree lat/lon resolution.  There are 19 
>> covariates in the example subset of it linked here: 
>> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0
>>
>> The following shows that I can't calculate the minimum spanning tree 
>> -- a necessary input into `skater` -- when the dataset includes areas 
>> undefined because they are over water.
>>
>> How would one get around this?
>>
>>    system('wget 
>> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0')
>>     load('example_data?dl=0')
>>
>>         1> with(x, plot(lon,lat))
>
> No, don't do this, you are obscuring the data. Do first str(x), then 
> build the object so that you know what is going on (the object 
> structure matches the data representation more closely):
>
> library(sp)
> coordinates(x) <- c("lon", "lat")
> proj4string(x) <- CRS("+proj=longlat +datum=WGS84")
> plot(x)
>
> OK, it's gridded, and n is small, so build the neighbour object using 
> polygons, queen neighbours:
>
> gridded(x) <- TRUE
> x1 <- as(x, "SpatialPolygonsDataFrame")
> library(spdep)
> bh.nb <- poly2nb(x1)
> plot(x1)
> plot(bh.nb, coordinates(x1), add=TRUE)
>
>
>>         1> library(spdep)
>>         1> bh.nb <- 
>> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>>         1> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>>         Error in data[id.neigh, , drop = FALSE] : subscript out of 
>> bounds
>>
>>
>> If I restrict the data to cut out the missing values, I have no problem:
>
> From there on, it is reasonably simple:
>
> lcosts <- nbcosts(nb = bh.nb, data = as(x1, "data.frame"),
>  method='euclidean')
> nb.w <- nb2listw(bh.nb, lcosts, style="B")
> mst.bh <- mstree(nb.w, 10)
> res1 <- skater(mst.bh[,1:2], as(x1, "data.frame"), 5)
> plot(res1, coordinates(x1), cex.circles=0.035, cex.lab=.7)
>
> Hope this clarifies,
>
> Roger
>
>
>>
>>     x = x[x$lon>-117,]
>>     bh.nb <- 
>> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>>     lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>>     nb.w <- nb2listw(bh.nb, lcosts, style="B")
>>     mst.bh <- mstree(nb.w,10)
>>     res1 <- skater(mst.bh[,1:2], x, 5)
>>     plot(res1, cbind(x$lon,x$lat), cex.circles=0.035, cex.lab=.7)
>>
>>
>> How do I get around this over-water problem?  I want to be able to 
>> cluster the land surfaces, including islands and peninsulas. I 
>> suppose that I want islands to be linked to their nearest point of land.
>>
>> References appreciated as well as fixes to the specific problem with 
>> the `spdep` interface.
>>
>> Thanks!
>> Andrew
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>



More information about the R-sig-Geo mailing list