[R-sig-ME] Repeated Observations Linear Mixed Model with Outside-Group Spatial Correlation

Wed Mar 22 17:51:09 CET 2017

Thanks for the responses everyone.

David and Ben - I am in the midst of trying the hack right now. I created a
single 'group' with all observations. Then I constructed the correlation
distance matrix using three variables latitude, longitude, and *time*. This
way I can capture the correlations from repeated observations (they have
the same latitude and longitude) as a function of their temporal distance
(time) and the correlations across stations as a function of their spatial
and temporal distance. I think this structure is actually better and more
flexible. I am still trying to figure out how to determine the tradeoff
between spatial distance and temporal distance that I want. Also, as Ben
suggested this method appears to be quite inefficient. The model has been
running for 11 hours and is using around 12gb of RAM. I think I might
create groups by clustering the stations geographically to reduce the
computational burden of what is now 12,271x12,271 correlation matrix.

Dexter -  lm::lm(), nlme::lme(), and lme4::lmer() all have the function
resid(model) that provides the residual for each observation in the model.
I used the package "ape" to calculate the Moran I.

model_spatial = lme(log(counts) ~ log(Population)
                 +Drive +Med_Income + Buff2 +Rain + Temperature
                 , data = Data, random = ~1|id, method = "ML" )

model_All = lmer(log(counts) ~ log(Population)
                 +Drive +Med_Income + Buff2 +Rain + Temperature
                 + (1|id)
                  , data = Data )
Data$resid_spat = resid(model_spatial )
Data$resid_all = resid(model_All )

Thierry - I assume you meant "nlme *cant *do what you would like to do"?

I will check out the INLA, spaMM, and HSAR packages if the nlme workaround
does not actually work.

On Wed, Mar 22, 2017 at 11:02 AM, Ben Bolker <bbolker at gmail.com> wrote:

>
>   The spaMM package is also worth a try.
>
>   A hack suggested a while ago by Dormann et al. is to make a single
> group that includes all observations, but (1) it's pretty inefficient
> and (2) I don't think you can have that *and* another, non-trivial
> grouping factor.
>
> On 17-03-22 10:46 AM, Dexter Locke wrote:
> > Dear list,
> >
> > You may consider also the citations below and HSAR package:
> > https://cran.r-project.org/web/packages/HSAR/vignettes/hsar.html
> >
> > Dear Michael,
> >
> > Do you have comparable code for extracting the residuals of the model and
> > testing for spatial autocorrelation that works with your second model
> > "model_All" created with lme4::lmer ?  What package contains the
> "Moran.I"
> > function?
> >
> > - Dexter
> >
> > Dong, G., & Harris, R. (2014). Spatial Autoregressive Models for
> > Geographically Hierarchical Data Structures. *Geographical Analysis*,
> > n/a-n/a. http://doi.org/10.1111/gean.12049
> >
> > Dong, G., Harris, R., Jones, K., & Yu, J. (2015). Multilevel Modelling
> with
> > Spatial Interaction Effects with Application to an Emerging Land Market
> in
> > Beijing, China. *PloS One*, *10*(6), e0130761.
> > http://doi.org/10.1371/journal.pone.0130761
> >
> > Dong, G., Ma, J., Harris, R., & Pryce, G. (2016). Spatial Random Slope
> > Multilevel Modeling Using Multivariate Conditional Autoregressive
> Models: A
> > Case Study of Subjective Travel Satisfaction in Beijing. *Annals of the
> > American Association of Geographers*, *106*(1), 19–35.
> > http://doi.org/10.1080/00045608.2015.1094388
> >
> > On Wed, Mar 22, 2017 at 5:07 AM, Thierry Onkelinx <
> thierry.onkelinx at inbo.be>
> > wrote:
> >
> >> Dear Michael,
> >>
> >> The correlation structures in nlme assume correlation among the
> >> residuals **within** the most detail level of the random effects.
> >> Residuals of observations originating from different levels of the
> >> random effects are assumed to be uncorrelated. So nlme can do what you
> >> would like to do.
> >>
> >> As Ben already mentioned, INLA is useful as it allows for spatially
> >> correlated random effects. You can find information on the INLA
> >> website (www.r-inla.org) and in a few books. e.g.
> >> - Blangiardo & Cameletti (2015) Spatial and Spatio-temporal Bayesian
> >> Model with R - INLA
> >> - Zuur et al (in press) Beginner's Guide to Spatial, Temporal and
> >> Spatial-Temporal Ecological Data Analysis with R-INLA: Using GLM and
> >> GLMM
> >>
> >> Best regards,
> >> ir. Thierry Onkelinx
> >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> >> and Forest
> >> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> >> Kliniekstraat 25
> >> 1070 Anderlecht
> >> Belgium
> >>
> >> To call in the statistician after the experiment is done may be no
> >> more than asking him to perform a post-mortem examination: he may be
> >> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> >> The plural of anecdote is not data. ~ Roger Brinner
> >> The combination of some data and an aching desire for an answer does
> >> not ensure that a reasonable answer can be extracted from a given body
> >> of data. ~ John Tukey
> >>
> >>
> >> 2017-03-21 22:19 GMT+01:00 Michael Hyland <mhyland at u.northwestern.edu>:
> >>> Thanks for the quick response.
> >>>
> >>> This is a subset. Full dataset is 12,266 observations across 530 groups
> >> (or
> >>> Bikeshare stations).
> >>>
> >>> for
> >>> model_spatial.gau <- update(model_spatial, correlation = corGaus(form
> = ~
> >>> latitude +longitude ), method = "ML")
> >>> and
> >>> model_spatial.gau <- update(model_spatial, correlation = corGaus(form
> = ~
> >>> latitude +longitude| id ), method = "ML")
> >>> the error is "cannot have zero distances in "corSpatial" which I assume
> >> is
> >>> due to the repeated observations having the same exact lat and lon;
> >>> therefore zero distance
> >>>
> >>> Moreover, when I do anything with '| id' I think the model only
> accounts
> >>> for 'within-group' correlations, not across stations
> >>>
> >>>
> >>> On Tue, Mar 21, 2017 at 4:12 PM, Ben Bolker <bbolker at gmail.com> wrote:
> >>>
> >>>> Your approach seems about right.
> >>>>
> >>>> - What precisely does "unsuccessful" mean?  warnings, errors,
> >>>> ridiculous answers?
> >>>> - Is this your whole data set or a subset?
> >>>> - centering and scaling predictors is always worth a shot to fix
> >>>> numeric problems
> >>>> - INLA is more powerful than lme for fitting spatial correlations,
> >>>> although it's a *steep* learning curve ...
> >>>>
> >>>>
> >>>> On Tue, Mar 21, 2017 at 5:08 PM, Michael Hyland
> >>>> <mhyland at u.northwestern.edu> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I'm new to the listserv.
> >>>>>
> >>>>> A shortened version of my dataset is below. I am developing a model
> to
> >>>>> forecast monthly ridership at Bikeshare stations. I want to predict
> >>>> 'Cnts'
> >>>>> as a function of 'Population' - 'Temperature'. The dataset is
> >> unbalanced
> >>>>> (unequal number of observations for each station) and most of
> >> covariates
> >>>> do
> >>>>> not vary over time, but a few do.
> >>>>>
> >>>>> I have successfully used lmer() and lme() in R to capture the
> >> dependency
> >>>>> between the error terms for repeated observations from a given
> station
> >>>>> ('id').
> >>>>>
> >>>>> model_spatial = lme(log(counts) ~ log(Population)
> >>>>>                  +Drive +Med_Income + Buff2 +Rain + Temperature
> >>>>>                  , data = Data, random = ~1|id, method = "ML" )
> >>>>>
> >>>>> model_All = lmer(log(counts) ~ log(Population)
> >>>>>                  +Drive +Med_Income + Buff2 +Rain + Temperature
> >>>>>                  + (1|id)
> >>>>>                   , data = Data )
> >>>>>
> >>>>> However, a Moran's I test of the residuals suggests that the
> residuals
> >>>> are
> >>>>> spatially correlated.
> >>>>> station.dists <- as.matrix(dist(cbind(Data$longitude,
> >> Data$latitude)))
> >>>>> station.dists.inv <- 1/station.dists
> >>>>> station.dists.inv[is.infinite(station.dists.inv)] <- 0   #Distance
> >>>> value is
> >>>>> inf for repeated observations from the same station
> >>>>> Data$resid_all = resid(model_spatial)
> >>>>> Moran.I(Data$resid_all, station.dists.inv)
> >>>>>
> >>>>>
> >>>>> Hence, I need to develop a model in R that accounts for spatial
> >>>> correlation
> >>>>> across stations, while simultaneously capturing correlations between
> >>>>> observations from a single station.  I've tried the following updates
> >> to
> >>>>> the lme() model, but have been unsuccessful.
> >>>>> model_spatial.gau <- update(model_spatial, correlation = corGaus(form
> >> = ~
> >>>>> latitude +longitude ), method = "ML")
> >>>>> model_spatial.gau <- update(model_spatial, correlation = corGaus(form
> >> = ~
> >>>>> latitude +longitude| id ), method = "ML")
> >>>>>
> >>>>> Is there a way to formulate the correlation matrix in lme() or lmer()
> >>>> such
> >>>>> that the correlation between repeated obvservations of a given
> station
> >>>> *and*
> >>>>> the spatial autocorrelation between stations is accounted for?
> >>>>>
> >>>>>
> >>>>> year month id Cnts latitude longitude Population Drive Med_Income
> >> Buff2
> >>>> Rain
> >>>>> Temperature
> >>>>> 2015 8 2 2597 41.87264 -87.62398 4256 0.3418054 76857 127 0.07 71.8
> >>>>> 2015 9 2 2772 41.87264 -87.62398 4256 0.3418054 76857 128 0.15 69
> >>>>> 2015 10 2 684 41.87264 -87.62398 4256 0.3418054 76857 128 0.07 54.7
> >>>>> 2015 11 2 394 41.87264 -87.62398 4256 0.3418054 76857 128 0.15 44.6
> >>>>> 2015 12 2 148 41.87264 -87.62398 4256 0.3418054 76857 129 0.16 39
> >>>>> 2016 1 2 44 41.87264 -87.62398 4256 0.3418054 76857 129 0.03 24.7
> >>>>> 2015 5 3 2303 41.86723 -87.61536 16735 0.4312349 75227 90 0.15 60.4
> >>>>> 2015 6 3 3323 41.86723 -87.61536 16735 0.4312349 75227 98 0.24 67.4
> >>>>> 2015 7 3 5920 41.86723 -87.61536 16735 0.4312349 75227 98 0.09 72.3
> >>>>> 2015 8 3 4405 41.86723 -87.61536 16735 0.4312349 75227 98 0.07 71.8
> >>>>> 2015 9 3 3638 41.86723 -87.61536 16735 0.4312349 75227 99 0.15 69
> >>>>> 2015 10 3 2061 41.86723 -87.61536 16735 0.4312349 75227 99 0.07 54.7
> >>>>> 2015 11 3 1074 41.86723 -87.61536 16735 0.4312349 75227 99 0.15 44.6
> >>>>> 2015 12 3 374 41.86723 -87.61536 16735 0.4312349 75227 100 0.16 39
> >>>>> 2016 1 3 188 41.86723 -87.61536 16735 0.4312349 75227 100 0.03 24.7
> >>>>> 2016 2 3 474 41.86723 -87.61536 16735 0.4312349 75227 100 0.04 30.4
> >>>>> 2015 6 4 2968 41.85627 -87.61335 16735 0.4312349 75227 68 0.24 67.4
> >>>>> 2015 7 4 4266 41.85627 -87.61335 16735 0.4312349 75227 68 0.09 72.3
> >>>>> 2015 8 4 3442 41.85627 -87.61335 16735 0.4312349 75227 68 0.07 71.8
> >>>>> 2015 9 4 2552 41.85627 -87.61335 16735 0.4312349 75227 69 0.15 69
> >>>>> 2015 10 4 1301 41.85627 -87.61335 16735 0.4312349 75227 69 0.07 54.7
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Mike Hyland
> >>>>>
> >>>>>         [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> R-sig-mixed-models at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>>>
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-mixed-models at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >> _______________________________________________
> >> R-sig-mixed-models at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]