[R-sig-Geo] Spatial filtering with glm with grid sampling

Roger Bivand Roger.Bivand at nhh.no
Thu Jan 11 13:33:38 CET 2018


On Thu, 11 Jan 2018, Sima Usvyatsov wrote:

> What other GLMM options do I have, under the restrictions of count data
> with lots of zeroes and spatial autocorrelation? I was under the impression
> that glmmPQL was my only choice?

See for example:

https://www.fromthebottomoftheheap.net/2017/05/04/compare-mgcv-with-glmmTMB/

https://rjournal.github.io/archive/2017/RJ-2017-066/index.html 
(forthcoming)

for surveys and discussions; maybe your current choice is the best bet for 
negbin.

Roger

>
> On Thu, Jan 11, 2018 at 8:18 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>
>> On Thu, 11 Jan 2018, Sima Usvyatsov wrote:
>>
>> Thank you so much for your response.
>>>
>>> Yes, I managed to muck up the fake data in 2 (!) ways - the 1,000 lons and
>>> the fact that the lon/lats weren't repeated. Here's the correct structure.
>>>
>>> df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20,
>>> 30, 0.1), each = 5), Lon = rep(rnorm(20, -75, 1), each = 5), x =
>>> rnegbin(100, 1, 1),
>>> Stratum = rep(1:5, each = 20))
>>>
>>> In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed
>>> effects - my fault, of course.
>>>
>>> My current model is set up as follows:
>>>
>>> mod1 <- glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors,
>>> random = ~ 1 |RoundStart,
>>> family = quasipoisson,
>>> correlation = corExp(form=~site.Easting + site.Northing + RoundStart)
>>>
>>> where RoundStart is the date/time of starting each count. I'm assuming
>>> that
>>> by "using a standard GLMM" you were thinking of MASS's glmmPQL()?
>>> Does this look correct for specifying the full space/time dependence? The
>>> variogram and pacf look very decent, but one can never have too many
>>> checks...
>>>
>>
>> I can't judge whether it really makes sense, but I think this is much more
>> robust. I'd explore some other GLMMs too, to see what they contribute.
>>
>> Roger
>>
>>
>>
>>> On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>> wrote:
>>>
>>> On Wed, 10 Jan 2018, Sima Usvyatsov wrote:
>>>>
>>>> Hello,
>>>>
>>>>>
>>>>> I am running a negative binomial model (MASS) on count data collected
>>>>> on a
>>>>> grid. The dataset is large - ~4,000 points, with many predictors. Being
>>>>> counts, there are a lot of zeroes. All data are collected on a grid with
>>>>> 20
>>>>> points, with high spatial autocorrelation.
>>>>>
>>>>> I would like to filter out the spatial autocorrelation. My question is:
>>>>> since I have very limited spatial info (only 20 distinct spatial
>>>>> locations), is it possible to simplify ME() so that I don't have to run
>>>>> it
>>>>> on the whole dataset? When I try to run ME() on a 100-point subset of
>>>>> the
>>>>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a
>>>>> single
>>>>> instance of the grid, I "get away" with a warning ("algorithm did not
>>>>> converge").
>>>>>
>>>>> Here's a fake dataset. It was grinding for a while but not throwing
>>>>> errors
>>>>> (like my original data would). Regardless, it demonstrates the repeated
>>>>> sampling at the same points and the large number of zeroes.
>>>>>
>>>>>
>>>> The data set has 1000 values in Lon, so is probably bigger than you
>>>> intended, and when 100 is used is not autocorrelated. You seem to have a
>>>> hierarchical model, with repeated measurements at the locations, so a
>>>> multi-level treatment of some kind may be sensible. If you want to stay
>>>> with ME-based spatial filtering, maybe look at the literature on spatial
>>>> panel (repeated measurements are in time) with ME/SF, and on network
>>>> autocorrelation (dyadic relationships with autocorrelation among origins
>>>> and/or destinations). Both these cases use Kronecker products on the
>>>> selected eigenvectors, I think.
>>>>
>>>> Alternatively, use a standard GLMM with a grouped iid random effect
>>>> and/or
>>>> a spatially structured random effect at the 20 location level. If the
>>>> groups are repeated observations in time, you should model the whole
>>>> (non-)separable space-time process.
>>>>
>>>> Hope this helps,
>>>>
>>>> Roger
>>>>
>>>>
>>>> Any advice would be most welcome.
>>>>>
>>>>> library(spdep)
>>>>> library(MASS)
>>>>>
>>>>> df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100,
>>>>> 30,
>>>>> 0.1), Lon = rnorm(1000, -75, 1), x = rnegbin(100, 1, 1))
>>>>> coordinates(df) <- ~Lon + Lat
>>>>> proj4string(df) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84
>>>>> +no_defs")
>>>>> nb <- dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE)
>>>>> dists <- nbdists(nb, coordinates(df), longlat=TRUE)
>>>>> glist <- lapply(dists, function(x) 1/x)
>>>>> lw <- nb2listw(nb, glist, style="W")
>>>>> me <- ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha =
>>>>> 0.5)
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
>>>> Editor-in-Chief of The R Journal, https://journal.r-project.org/
>>>> index.html
>>>> http://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>>
>>>>
>>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
>> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list