[R-sig-Geo] Spatial clustering gridded data with missing values (over water)
ACD
andrewcd at gmail.com
Mon Nov 23 19:21:22 CET 2015
Dear Roger,
Many thanks for this reply -- I see that the problem goes away when
representing the data as gridcell polygons rather than as the gridpoints
which they are.
This works well on the small data subset that I uploaded for my minimal
working example, but the algorithm is quite slow on the full dataset
(>3000 gridcells, with 19+ covariates). Is there anything inherently
faster or slower about the skater algorithm run on polygons vs the algo
run on gridded data?
Thanks,
Andrew
On 11/13/2015 08:37 AM, Roger Bivand wrote:
> On Tue, 10 Nov 2015, ACD wrote:
>
>> I'm trying to cluster a spatial dataset, and use the cluster labels
>> as an input to a second process. I've been using the `spdep` package
>> in R.
>>
>> I've got gridded data at .5 degree lat/lon resolution. There are 19
>> covariates in the example subset of it linked here:
>> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0
>>
>> The following shows that I can't calculate the minimum spanning tree
>> -- a necessary input into `skater` -- when the dataset includes areas
>> undefined because they are over water.
>>
>> How would one get around this?
>>
>> system('wget
>> https://www.dropbox.com/s/i72na4k0k5gqvvx/example_data?dl=0')
>> load('example_data?dl=0')
>>
>> 1> with(x, plot(lon,lat))
>
> No, don't do this, you are obscuring the data. Do first str(x), then
> build the object so that you know what is going on (the object
> structure matches the data representation more closely):
>
> library(sp)
> coordinates(x) <- c("lon", "lat")
> proj4string(x) <- CRS("+proj=longlat +datum=WGS84")
> plot(x)
>
> OK, it's gridded, and n is small, so build the neighbour object using
> polygons, queen neighbours:
>
> gridded(x) <- TRUE
> x1 <- as(x, "SpatialPolygonsDataFrame")
> library(spdep)
> bh.nb <- poly2nb(x1)
> plot(x1)
> plot(bh.nb, coordinates(x1), add=TRUE)
>
>
>> 1> library(spdep)
>> 1> bh.nb <-
>> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>> 1> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>> Error in data[id.neigh, , drop = FALSE] : subscript out of
>> bounds
>>
>>
>> If I restrict the data to cut out the missing values, I have no problem:
>
> From there on, it is reasonably simple:
>
> lcosts <- nbcosts(nb = bh.nb, data = as(x1, "data.frame"),
> method='euclidean')
> nb.w <- nb2listw(bh.nb, lcosts, style="B")
> mst.bh <- mstree(nb.w, 10)
> res1 <- skater(mst.bh[,1:2], as(x1, "data.frame"), 5)
> plot(res1, coordinates(x1), cex.circles=0.035, cex.lab=.7)
>
> Hope this clarifies,
>
> Roger
>
>
>>
>> x = x[x$lon>-117,]
>> bh.nb <-
>> cell2nb(length(unique(x$lon)),length(unique(x$lat)),torus=F,type='queen')
>> lcosts <- nbcosts(nb = bh.nb, data = x,method='euclidean')
>> nb.w <- nb2listw(bh.nb, lcosts, style="B")
>> mst.bh <- mstree(nb.w,10)
>> res1 <- skater(mst.bh[,1:2], x, 5)
>> plot(res1, cbind(x$lon,x$lat), cex.circles=0.035, cex.lab=.7)
>>
>>
>> How do I get around this over-water problem? I want to be able to
>> cluster the land surfaces, including islands and peninsulas. I
>> suppose that I want islands to be linked to their nearest point of land.
>>
>> References appreciated as well as fixes to the specific problem with
>> the `spdep` interface.
>>
>> Thanks!
>> Andrew
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
More information about the R-sig-Geo
mailing list