[R-sig-Geo] Spatial filtering with glm with grid sampling

Sima Usvyatsov ghiaco at gmail.com
Thu Jan 11 13:22:21 CET 2018


What other GLMM options do I have, under the restrictions of count data
with lots of zeroes and spatial autocorrelation? I was under the impression
that glmmPQL was my only choice?

On Thu, Jan 11, 2018 at 8:18 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:

> On Thu, 11 Jan 2018, Sima Usvyatsov wrote:
>
> Thank you so much for your response.
>>
>> Yes, I managed to muck up the fake data in 2 (!) ways - the 1,000 lons and
>> the fact that the lon/lats weren't repeated. Here's the correct structure.
>>
>> df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rep(rnorm(20,
>> 30, 0.1), each = 5), Lon = rep(rnorm(20, -75, 1), each = 5), x =
>> rnegbin(100, 1, 1),
>> Stratum = rep(1:5, each = 20))
>>
>> In the meantime, I (somewhat) resolved the issue with the glmmPQL fixed
>> effects - my fault, of course.
>>
>> My current model is set up as follows:
>>
>> mod1 <- glmmPQL(Count ~ Stratum + SiteInStratum + ...other predictors,
>> random = ~ 1 |RoundStart,
>> family = quasipoisson,
>> correlation = corExp(form=~site.Easting + site.Northing + RoundStart)
>>
>> where RoundStart is the date/time of starting each count. I'm assuming
>> that
>> by "using a standard GLMM" you were thinking of MASS's glmmPQL()?
>> Does this look correct for specifying the full space/time dependence? The
>> variogram and pacf look very decent, but one can never have too many
>> checks...
>>
>
> I can't judge whether it really makes sense, but I think this is much more
> robust. I'd explore some other GLMMs too, to see what they contribute.
>
> Roger
>
>
>
>> On Thu, Jan 11, 2018 at 5:01 AM, Roger Bivand <Roger.Bivand at nhh.no>
>> wrote:
>>
>> On Wed, 10 Jan 2018, Sima Usvyatsov wrote:
>>>
>>> Hello,
>>>
>>>>
>>>> I am running a negative binomial model (MASS) on count data collected
>>>> on a
>>>> grid. The dataset is large - ~4,000 points, with many predictors. Being
>>>> counts, there are a lot of zeroes. All data are collected on a grid with
>>>> 20
>>>> points, with high spatial autocorrelation.
>>>>
>>>> I would like to filter out the spatial autocorrelation. My question is:
>>>> since I have very limited spatial info (only 20 distinct spatial
>>>> locations), is it possible to simplify ME() so that I don't have to run
>>>> it
>>>> on the whole dataset? When I try to run ME() on a 100-point subset of
>>>> the
>>>> data, I get error in glm.fit: NA/NaN/Inf in 'x'. When I run it on a
>>>> single
>>>> instance of the grid, I "get away" with a warning ("algorithm did not
>>>> converge").
>>>>
>>>> Here's a fake dataset. It was grinding for a while but not throwing
>>>> errors
>>>> (like my original data would). Regardless, it demonstrates the repeated
>>>> sampling at the same points and the large number of zeroes.
>>>>
>>>>
>>> The data set has 1000 values in Lon, so is probably bigger than you
>>> intended, and when 100 is used is not autocorrelated. You seem to have a
>>> hierarchical model, with repeated measurements at the locations, so a
>>> multi-level treatment of some kind may be sensible. If you want to stay
>>> with ME-based spatial filtering, maybe look at the literature on spatial
>>> panel (repeated measurements are in time) with ME/SF, and on network
>>> autocorrelation (dyadic relationships with autocorrelation among origins
>>> and/or destinations). Both these cases use Kronecker products on the
>>> selected eigenvectors, I think.
>>>
>>> Alternatively, use a standard GLMM with a grouped iid random effect
>>> and/or
>>> a spatially structured random effect at the 20 location level. If the
>>> groups are repeated observations in time, you should model the whole
>>> (non-)separable space-time process.
>>>
>>> Hope this helps,
>>>
>>> Roger
>>>
>>>
>>> Any advice would be most welcome.
>>>>
>>>> library(spdep)
>>>> library(MASS)
>>>>
>>>> df <- data.frame(Loc = as.factor(rep(1:20, each = 5)), Lat = rnorm(100,
>>>> 30,
>>>> 0.1), Lon = rnorm(1000, -75, 1), x = rnegbin(100, 1, 1))
>>>> coordinates(df) <- ~Lon + Lat
>>>> proj4string(df) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84
>>>> +no_defs")
>>>> nb <- dnearneigh(x=coordinates(df), d1=0, d2=200,longlat = TRUE)
>>>> dists <- nbdists(nb, coordinates(df), longlat=TRUE)
>>>> glist <- lapply(dists, function(x) 1/x)
>>>> lw <- nb2listw(nb, glist, style="W")
>>>> me <- ME(x ~ 1, data = df, family = "quasipoisson", listw = lw, alpha =
>>>> 0.5)
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>>> --
>>> Roger Bivand
>>> Department of Economics, Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
>>> Editor-in-Chief of The R Journal, https://journal.r-project.org/
>>> index.html
>>> http://orcid.org/0000-0003-2392-6140
>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>>
>>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list