[R-sig-Geo] Spatial filtering with glm with grid sampling
Roger Bivand
Roger.Bivand at nhh.no
Thu Jan 11 10:01:13 CET 2018
On Wed, 10 Jan 2018, Sima Usvyatsov wrote:
> Hello,
>
> I am running a negative binomial model (MASS) on count data collected on a
> grid. The dataset is large - ~4,000 points, with many predictors. Being
> counts, there are a lot of zeroes. All data are collected on a grid with 20
> points, with high spatial autocorrelation.
>
> I would like to filter out the spatial autocorrelation. My question is:
> since I have very limited spatial info (only 20 distinct spatial
> locations), is it possible to simplify ME() so that I don't have to run it
> on the whole dataset? When I try to run ME() on a 100-point subset of the
> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a single
> instance of the grid, I "get away" with a warning ("algorithm did not
> converge").
>
> Here's a fake dataset. It was grinding for a while but not throwing errors
> (like my original data would). Regardless, it demonstrates the repeated
> sampling at the same points and the large number of zeroes.
The data set has 1000 values in Lon, so is probably bigger than you
intended, and when 100 is used is not autocorrelated. You seem to have a
hierarchical model, with repeated measurements at the locations, so a
multi-level treatment of some kind may be sensible. If you want to stay
with ME-based spatial filtering, maybe look at the literature on spatial
panel (repeated measurements are in time) with ME/SF, and on network
autocorrelation (dyadic relationships with autocorrelation among origins
and/or destinations). Both these cases use Kronecker products on the
selected eigenvectors, I think.
Alternatively, use a standard GLMM with a grouped iid random effect and/or
a spatially structured random effect at the 20 location level. If the
groups are repeated observations in time, you should model the whole
(non-)separable space-time process.
Hope this helps,
Roger
>
> Any advice would be most welcome.
>
> library(spdep)
> library(MASS)
>
> df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100, 30,
> 0.1), Lon = rnorm(1000, -75, 1), x = rnegbin(100, 1, 1))
> coordinates(df) <- ~Lon + Lat
> proj4string(df) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
> nb <- dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE)
> dists <- nbdists(nb, coordinates(df), longlat=TRUE)
> glist <- lapply(dists, function(x) 1/x)
> lw <- nb2listw(nb, glist, style="W")
> me <- ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha = 0.5)
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
More information about the R-sig-Geo
mailing list