[R-sig-Geo] Spatial clustering gridded data with missing values (over water)
Roger Bivand
Roger.Bivand at nhh.no
Mon Nov 23 19:27:33 CET 2015
On Mon, 23 Nov 2015, ACD wrote:
> Dear Roger,
>
> Many thanks for this reply -- I see that the problem goes away when
> representing the data as gridcell polygons rather than as the gridpoints
> which they are.
>
> This works well on the small data subset that I uploaded for my minimal
> working example, but the algorithm is quite slow on the full dataset (>3000
> gridcells, with 19+ covariates). Is there anything inherently faster or
> slower about the skater algorithm run on polygons vs the algo run on gridded
> data?
No, nothing faster, and note that the important conversion is via
SpatialPixels (gridded() <- TRUE), which avoids the no-data grid cells in
your example. All these methods work on the underlying neighbour graph,
which for skater must be a single graph with no detached subgraphs. You
may use the facilities provided in the skater workflow for parallel
processing, which may help.
Roger
>
> Thanks,
> Andrew
>
> On 11/13/2015 08:37 AM, Roger Bivand wrote:
>> On Tue, 10 Nov 2015, ACD wrote:
>>
>> > I'm trying to cluster a spatial dataset, and use the cluster labels as
>> > an input to a second process. I've been using the `spdep` package in R.
>> >
>> > I've got gridded data at .5 degree lat/lon resolution. There are 19
>> > covariates in the example subset of it linked here:
>> > https: //www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0
>> >
>> > The following shows that I can't calculate the minimum spanning tree --
>> > a necessary input into `skater` -- when the dataset includes areas
>> > undefined because they are over water.
>> >
>> > How would one get around this?
>> >
>> > system('wget
>> > https: //www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0')
>> > load('example_data?dl=0')
>> >
>> > 1> with(x, plot(lon,lat))
>>
>> No, don't do this, you are obscuring the data. Do first str(x), then build
>> the object so that you know what is going on (the object structure matches
>> the data representation more closely):
>>
>> library(sp)
>> coordinates(x) <- c("lon", "lat")
>> proj4string(x) <- CRS("+proj=longlat +datum=WGS84")
>> plot(x)
>>
>> OK, it's gridded, and n is small, so build the neighbour object using
>> polygons, queen neighbours:
>>
>> gridded(x) <- TRUE
>> x1 <- as(x, "SpatialPolygonsDataFrame")
>> library(spdep)
>> bh.nb <- poly2nb(x1)
>> plot(x1)
>> plot(bh.nb, coordinates(x1), add=TRUE)
>>
>>
>> > 1> library(spdep)
>> > 1> bh.nb <-
>> > cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>> > 1> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>> > Error in data[id.neigh, , drop = FALSE] : subscript out of
>> > bounds
>> >
>> >
>> > If I restrict the data to cut out the missing values, I have no problem:
>>
>> From there on, it is reasonably simple:
>>
>> lcosts <- nbcosts(nb = bh.nb, data = as(x1, "data.frame"),
>> method='euclidean')
>> nb.w <- nb2listw(bh.nb, lcosts, style="B")
>> mst.bh <- mstree(nb.w, 10)
>> res1 <- skater(mst.bh[,1:2], as(x1, "data.frame"), 5)
>> plot(res1, coordinates(x1), cex.circles=0.035, cex.lab=.7)
>>
>> Hope this clarifies,
>>
>> Roger
>>
>>
>> >
>> > x = x[x$lon>-117,]
>> > bh.nb <-
>> > cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>> > lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>> > nb.w <- nb2listw(bh.nb, lcosts, style="B")
>> > mst.bh <- mstree(nb.w,10)
>> > res1 <- skater(mst.bh[,1:2], x, 5)
>> > plot(res1, cbind(x$lon,x$lat), cex.circles=0.035, cex.lab=.7)
>> >
>> >
>> > How do I get around this over-water problem? I want to be able to
>> > cluster the land surfaces, including islands and peninsulas. I suppose
>> > that I want islands to be linked to their nearest point of land.
>> >
>> > References appreciated as well as fixes to the specific problem with the
>> > `spdep` interface.
>> >
>> > Thanks!
>> > Andrew
>> >
>> > _______________________________________________
>> > R-sig-Geo mailing list
>> > R-sig-Geo at r-project.org
>> > https: //stat.ethz.ch/mailman/listinfo/r-sig-geo
>> >
>>
>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list