[R-sig-Geo] Spatial filtering with glm with grid sampling

Thu Jan 11 13:18:48 CET 2018

On Thu, 11 Jan 2018, Sima Usvyatsov wrote:

> Thank you so much for your response.
>
> Yes, I managed to muck up the fake data in 2 (!) ways - the 1,000 lons and
> the fact that the lon/lats weren't repeated. Here's the correct structure.
>
> df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20,
> 30, 0.1), each = 5), Lon = rep(rnorm(20, -75, 1), each = 5), x =
> rnegbin(100, 1, 1),
> Stratum = rep(1:5, each = 20))
>
> In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed
> effects - my fault, of course.
>
> My current model is set up as follows:
>
> mod1 <- glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors,
> random = ~ 1 |RoundStart,
> family = quasipoisson,
> correlation = corExp(form=~site.Easting + site.Northing + RoundStart)
>
> where RoundStart is the date/time of starting each count. I'm assuming that
> by "using a standard GLMM" you were thinking of MASS's glmmPQL()?
> Does this look correct for specifying the full space/time dependence? The
> variogram and pacf look very decent, but one can never have too many
> checks...

I can't judge whether it really makes sense, but I think this is much more 
robust. I'd explore some other GLMMs too, to see what they contribute.

Roger

>
> On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Wed, 10 Jan 2018, Sima Usvyatsov wrote:
>>
>> Hello,
>>>
>>> I am running a negative binomial model (MASS) on count data collected on a
>>> grid. The dataset is large - ~4,000 points, with many predictors. Being
>>> counts, there are a lot of zeroes. All data are collected on a grid with
>>> 20
>>> points, with high spatial autocorrelation.
>>>
>>> I would like to filter out the spatial autocorrelation. My question is:
>>> since I have very limited spatial info (only 20 distinct spatial
>>> locations), is it possible to simplify ME() so that I don't have to run it
>>> on the whole dataset? When I try to run ME() on a 100-point subset of the
>>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a single
>>> instance of the grid, I "get away" with a warning ("algorithm did not
>>> converge").
>>>
>>> Here's a fake dataset. It was grinding for a while but not throwing errors
>>> (like my original data would). Regardless, it demonstrates the repeated
>>> sampling at the same points and the large number of zeroes.
>>>
>>
>> The data set has 1000 values in Lon, so is probably bigger than you
>> intended, and when 100 is used is not autocorrelated. You seem to have a
>> hierarchical model, with repeated measurements at the locations, so a
>> multi-level treatment of some kind may be sensible. If you want to stay
>> with ME-based spatial filtering, maybe look at the literature on spatial
>> panel (repeated measurements are in time) with ME/SF, and on network
>> autocorrelation (dyadic relationships with autocorrelation among origins
>> and/or destinations). Both these cases use Kronecker products on the
>> selected eigenvectors, I think.
>>
>> Alternatively, use a standard GLMM with a grouped iid random effect and/or
>> a spatially structured random effect at the 20 location level. If the
>> groups are repeated observations in time, you should model the whole
>> (non-)separable space-time process.
>>
>> Hope this helps,
>>
>> Roger
>>
>>
>>> Any advice would be most welcome.
>>>
>>> library(spdep)
>>> library(MASS)
>>>
>>> df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100,
>>> 30,
>>> 0.1), Lon = rnorm(1000, -75, 1), x = rnegbin(100, 1, 1))
>>> coordinates(df) <- ~Lon + Lat
>>> proj4string(df) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
>>> nb <- dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE)
>>> dists <- nbdists(nb, coordinates(df), longlat=TRUE)
>>> glist <- lapply(dists, function(x) 1/x)
>>> lw <- nb2listw(nb, glist, style="W")
>>> me <- ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha =
>>> 0.5)
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
>> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en