[R-sig-ME] Repeated Observations Linear Mixed Model with Outside-Group Spatial Correlation

Ben Bolker bbolker at gmail.com
Wed Mar 22 23:55:26 CET 2017


Last comment:  for a large data set, INLA or something that handles
the pairwise interactions in a smart way (e.g. SAR/CAR models from
geography) are probably going to be necessary.  In contrast,
approaches that rely on repeated, brute-force Cholesky decompositions
of large matrices (e.g., nlme) are probably going to crap out.

Another approach is to use an additive model to fit the relatively
high-frequency (but still smooth) spatial variation, i.e. see
mgcv/gamm4 packages (and Simon Wood's 2006 book).



On Wed, Mar 22, 2017 at 12:51 PM, Michael Hyland
<mhyland at u.northwestern.edu> wrote:
> Thanks for the responses everyone.
>
> David and Ben - I am in the midst of trying the hack right now. I created a
> single 'group' with all observations. Then I constructed the correlation
> distance matrix using three variables latitude, longitude, and time. This
> way I can capture the correlations from repeated observations (they have the
> same latitude and longitude) as a function of their temporal distance (time)
> and the correlations across stations as a function of their spatial and
> temporal distance. I think this structure is actually better and more
> flexible. I am still trying to figure out how to determine the tradeoff
> between spatial distance and temporal distance that I want. Also, as Ben
> suggested this method appears to be quite inefficient. The model has been
> running for 11 hours and is using around 12gb of RAM. I think I might create
> groups by clustering the stations geographically to reduce the computational
> burden of what is now 12,271x12,271 correlation matrix.
>
> Dexter -  lm::lm(), nlme::lme(), and lme4::lmer() all have the function
> resid(model) that provides the residual for each observation in the model. I
> used the package "ape" to calculate the Moran I.
>
> model_spatial = lme(log(counts) ~ log(Population)
>                  +Drive +Med_Income + Buff2 +Rain + Temperature
>                  , data = Data, random = ~1|id, method = "ML" )
>
> model_All = lmer(log(counts) ~ log(Population)
>                  +Drive +Med_Income + Buff2 +Rain + Temperature
>                  + (1|id)
>                   , data = Data )
> Data$resid_spat = resid(model_spatial )
> Data$resid_all = resid(model_All )
>
> Thierry - I assume you meant "nlme cant do what you would like to do"?
>
> I will check out the INLA, spaMM, and HSAR packages if the nlme workaround
> does not actually work.
>
>
>
> On Wed, Mar 22, 2017 at 11:02 AM, Ben Bolker <bbolker at gmail.com> wrote:
>>
>>
>>   The spaMM package is also worth a try.
>>
>>   A hack suggested a while ago by Dormann et al. is to make a single
>> group that includes all observations, but (1) it's pretty inefficient
>> and (2) I don't think you can have that *and* another, non-trivial
>> grouping factor.
>>
>> On 17-03-22 10:46 AM, Dexter Locke wrote:
>> > Dear list,
>> >
>> > You may consider also the citations below and HSAR package:
>> > https://cran.r-project.org/web/packages/HSAR/vignettes/hsar.html
>> >
>> > Dear Michael,
>> >
>> > Do you have comparable code for extracting the residuals of the model
>> > and
>> > testing for spatial autocorrelation that works with your second model
>> > "model_All" created with lme4::lmer ?  What package contains the
>> > "Moran.I"
>> > function?
>> >
>> > - Dexter
>> >
>> > Dong, G., & Harris, R. (2014). Spatial Autoregressive Models for
>> > Geographically Hierarchical Data Structures. *Geographical Analysis*,
>> > n/a-n/a. http://doi.org/10.1111/gean.12049
>> >
>> > Dong, G., Harris, R., Jones, K., & Yu, J. (2015). Multilevel Modelling
>> > with
>> > Spatial Interaction Effects with Application to an Emerging Land Market
>> > in
>> > Beijing, China. *PloS One*, *10*(6), e0130761.
>> > http://doi.org/10.1371/journal.pone.0130761
>> >
>> > Dong, G., Ma, J., Harris, R., & Pryce, G. (2016). Spatial Random Slope
>> > Multilevel Modeling Using Multivariate Conditional Autoregressive
>> > Models: A
>> > Case Study of Subjective Travel Satisfaction in Beijing. *Annals of the
>> > American Association of Geographers*, *106*(1), 19–35.
>> > http://doi.org/10.1080/00045608.2015.1094388
>> >
>> > On Wed, Mar 22, 2017 at 5:07 AM, Thierry Onkelinx
>> > <thierry.onkelinx at inbo.be>
>> > wrote:
>> >
>> >> Dear Michael,
>> >>
>> >> The correlation structures in nlme assume correlation among the
>> >> residuals **within** the most detail level of the random effects.
>> >> Residuals of observations originating from different levels of the
>> >> random effects are assumed to be uncorrelated. So nlme can do what you
>> >> would like to do.
>> >>
>> >> As Ben already mentioned, INLA is useful as it allows for spatially
>> >> correlated random effects. You can find information on the INLA
>> >> website (www.r-inla.org) and in a few books. e.g.
>> >> - Blangiardo & Cameletti (2015) Spatial and Spatio-temporal Bayesian
>> >> Model with R - INLA
>> >> - Zuur et al (in press) Beginner's Guide to Spatial, Temporal and
>> >> Spatial-Temporal Ecological Data Analysis with R-INLA: Using GLM and
>> >> GLMM
>> >>
>> >> Best regards,
>> >> ir. Thierry Onkelinx
>> >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> >> and Forest
>> >> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> >> Kliniekstraat 25
>> >> 1070 Anderlecht
>> >> Belgium
>> >>
>> >> To call in the statistician after the experiment is done may be no
>> >> more than asking him to perform a post-mortem examination: he may be
>> >> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> >> The plural of anecdote is not data. ~ Roger Brinner
>> >> The combination of some data and an aching desire for an answer does
>> >> not ensure that a reasonable answer can be extracted from a given body
>> >> of data. ~ John Tukey
>> >>
>> >>
>> >> 2017-03-21 22:19 GMT+01:00 Michael Hyland <mhyland at u.northwestern.edu>:
>> >>> Thanks for the quick response.
>> >>>
>> >>> This is a subset. Full dataset is 12,266 observations across 530
>> >>> groups
>> >> (or
>> >>> Bikeshare stations).
>> >>>
>> >>> for
>> >>> model_spatial.gau <- update(model_spatial, correlation = corGaus(form
>> >>> = ~
>> >>> latitude +longitude ), method = "ML")
>> >>> and
>> >>> model_spatial.gau <- update(model_spatial, correlation = corGaus(form
>> >>> = ~
>> >>> latitude +longitude| id ), method = "ML")
>> >>> the error is "cannot have zero distances in "corSpatial" which I
>> >>> assume
>> >> is
>> >>> due to the repeated observations having the same exact lat and lon;
>> >>> therefore zero distance
>> >>>
>> >>> Moreover, when I do anything with '| id' I think the model only
>> >>> accounts
>> >>> for 'within-group' correlations, not across stations
>> >>>
>> >>>
>> >>> On Tue, Mar 21, 2017 at 4:12 PM, Ben Bolker <bbolker at gmail.com> wrote:
>> >>>
>> >>>> Your approach seems about right.
>> >>>>
>> >>>> - What precisely does "unsuccessful" mean?  warnings, errors,
>> >>>> ridiculous answers?
>> >>>> - Is this your whole data set or a subset?
>> >>>> - centering and scaling predictors is always worth a shot to fix
>> >>>> numeric problems
>> >>>> - INLA is more powerful than lme for fitting spatial correlations,
>> >>>> although it's a *steep* learning curve ...
>> >>>>
>> >>>>
>> >>>> On Tue, Mar 21, 2017 at 5:08 PM, Michael Hyland
>> >>>> <mhyland at u.northwestern.edu> wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> I'm new to the listserv.
>> >>>>>
>> >>>>> A shortened version of my dataset is below. I am developing a model
>> >>>>> to
>> >>>>> forecast monthly ridership at Bikeshare stations. I want to predict
>> >>>> 'Cnts'
>> >>>>> as a function of 'Population' - 'Temperature'. The dataset is
>> >> unbalanced
>> >>>>> (unequal number of observations for each station) and most of
>> >> covariates
>> >>>> do
>> >>>>> not vary over time, but a few do.
>> >>>>>
>> >>>>> I have successfully used lmer() and lme() in R to capture the
>> >> dependency
>> >>>>> between the error terms for repeated observations from a given
>> >>>>> station
>> >>>>> ('id').
>> >>>>>
>> >>>>> model_spatial = lme(log(counts) ~ log(Population)
>> >>>>>                  +Drive +Med_Income + Buff2 +Rain + Temperature
>> >>>>>                  , data = Data, random = ~1|id, method = "ML" )
>> >>>>>
>> >>>>> model_All = lmer(log(counts) ~ log(Population)
>> >>>>>                  +Drive +Med_Income + Buff2 +Rain + Temperature
>> >>>>>                  + (1|id)
>> >>>>>                   , data = Data )
>> >>>>>
>> >>>>> However, a Moran's I test of the residuals suggests that the
>> >>>>> residuals
>> >>>> are
>> >>>>> spatially correlated.
>> >>>>> station.dists <- as.matrix(dist(cbind(Data$longitude,
>> >> Data$latitude)))
>> >>>>> station.dists.inv <- 1/station.dists
>> >>>>> station.dists.inv[is.infinite(station.dists.inv)] <- 0   #Distance
>> >>>> value is
>> >>>>> inf for repeated observations from the same station
>> >>>>> Data$resid_all = resid(model_spatial)
>> >>>>> Moran.I(Data$resid_all, station.dists.inv)
>> >>>>>
>> >>>>>
>> >>>>> Hence, I need to develop a model in R that accounts for spatial
>> >>>> correlation
>> >>>>> across stations, while simultaneously capturing correlations between
>> >>>>> observations from a single station.  I've tried the following
>> >>>>> updates
>> >> to
>> >>>>> the lme() model, but have been unsuccessful.
>> >>>>> model_spatial.gau <- update(model_spatial, correlation =
>> >>>>> corGaus(form
>> >> = ~
>> >>>>> latitude +longitude ), method = "ML")
>> >>>>> model_spatial.gau <- update(model_spatial, correlation =
>> >>>>> corGaus(form
>> >> = ~
>> >>>>> latitude +longitude| id ), method = "ML")
>> >>>>>
>> >>>>> Is there a way to formulate the correlation matrix in lme() or
>> >>>>> lmer()
>> >>>> such
>> >>>>> that the correlation between repeated obvservations of a given
>> >>>>> station
>> >>>> *and*
>> >>>>> the spatial autocorrelation between stations is accounted for?
>> >>>>>
>> >>>>>
>> >>>>> year month id Cnts latitude longitude Population Drive Med_Income
>> >> Buff2
>> >>>> Rain
>> >>>>> Temperature
>> >>>>> 2015 8 2 2597 41.87264 -87.62398 4256 0.3418054 76857 127 0.07 71.8
>> >>>>> 2015 9 2 2772 41.87264 -87.62398 4256 0.3418054 76857 128 0.15 69
>> >>>>> 2015 10 2 684 41.87264 -87.62398 4256 0.3418054 76857 128 0.07 54.7
>> >>>>> 2015 11 2 394 41.87264 -87.62398 4256 0.3418054 76857 128 0.15 44.6
>> >>>>> 2015 12 2 148 41.87264 -87.62398 4256 0.3418054 76857 129 0.16 39
>> >>>>> 2016 1 2 44 41.87264 -87.62398 4256 0.3418054 76857 129 0.03 24.7
>> >>>>> 2015 5 3 2303 41.86723 -87.61536 16735 0.4312349 75227 90 0.15 60.4
>> >>>>> 2015 6 3 3323 41.86723 -87.61536 16735 0.4312349 75227 98 0.24 67.4
>> >>>>> 2015 7 3 5920 41.86723 -87.61536 16735 0.4312349 75227 98 0.09 72.3
>> >>>>> 2015 8 3 4405 41.86723 -87.61536 16735 0.4312349 75227 98 0.07 71.8
>> >>>>> 2015 9 3 3638 41.86723 -87.61536 16735 0.4312349 75227 99 0.15 69
>> >>>>> 2015 10 3 2061 41.86723 -87.61536 16735 0.4312349 75227 99 0.07 54.7
>> >>>>> 2015 11 3 1074 41.86723 -87.61536 16735 0.4312349 75227 99 0.15 44.6
>> >>>>> 2015 12 3 374 41.86723 -87.61536 16735 0.4312349 75227 100 0.16 39
>> >>>>> 2016 1 3 188 41.86723 -87.61536 16735 0.4312349 75227 100 0.03 24.7
>> >>>>> 2016 2 3 474 41.86723 -87.61536 16735 0.4312349 75227 100 0.04 30.4
>> >>>>> 2015 6 4 2968 41.85627 -87.61335 16735 0.4312349 75227 68 0.24 67.4
>> >>>>> 2015 7 4 4266 41.85627 -87.61335 16735 0.4312349 75227 68 0.09 72.3
>> >>>>> 2015 8 4 3442 41.85627 -87.61335 16735 0.4312349 75227 68 0.07 71.8
>> >>>>> 2015 9 4 2552 41.85627 -87.61335 16735 0.4312349 75227 69 0.15 69
>> >>>>> 2015 10 4 1301 41.85627 -87.61335 16735 0.4312349 75227 69 0.07 54.7
>> >>>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Mike Hyland
>> >>>>>
>> >>>>>         [[alternative HTML version deleted]]
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> R-sig-mixed-models at r-project.org mailing list
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>>>
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> _______________________________________________
>> >>> R-sig-mixed-models at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>
>> >> _______________________________________________
>> >> R-sig-mixed-models at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-sig-mixed-models at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>



More information about the R-sig-mixed-models mailing list