[R-sig-ME] spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation

Wed Jul 15 16:01:03 CEST 2020

Dear Thierry,

thanks. So (expectedly) this is a different issue. spaMM can fit some 
correlation models described by objects produced by 
INLA::inla.spde2.matern() and then, in my past experiments, the 
computation times were close to those of INLA, and the memory 
requirements were much smaller than what I wrote previously where this 
is not what I meant by "Matern".

Beyond general features that contribute to these computational 
differences (the use of sparse matrix methods, and to a lesser extent 
the constraint on the smoothness parameter of the approximated Matern 
model), the 'cutoff' argument in your call to inla.mesh.2d() appears 
important to reduce the number  of locations actually considered, in the 
most costly computations, below the number of locations in the data (to 
8804 rather than 30K, if I get it right), and this would also allow a 
faster fit by spaMM when called on the resulting inla.spde2 object.

Best,

F.

Le 15/07/2020 à 12:50, Thierry Onkelinx a écrit :
> Dear François,
>
> Here you go: 
> https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio
> Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB 
> RAM.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE 
> AND FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be <http://www.inbo.be>
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no 
> more than asking him to perform a post-mortem examination: he may be 
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does 
> not ensure that a reasonable answer can be extracted from a given body 
> of data. ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset 
> <francois.rousset using umontpellier.fr 
> <mailto:francois.rousset using umontpellier.fr>>:
>
>     Dear Thierry,
>
>     please provide a reproducible example so that we know what you
>     have actually done.
>
>     Best,
>
>     F.
>
>     Le 14/07/2020 à 20:00, Thierry Onkelinx a écrit :
>>     Dear François and Sarah,
>>
>>     INLA seems more efficient. I ran a model with Mattern correlation
>>     structure on 13K locations (1 observation per location) in under
>>     10 minutes on a laptop with 16GB RAM.
>>
>>     Best regards,
>>
>>     ir. Thierry Onkelinx
>>     Statisticus / Statistician
>>
>>     Vlaamse Overheid / Government of Flanders
>>     INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR
>>     NATURE AND FOREST
>>     Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality
>>     Assurance
>>     thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
>>     Havenlaan 88 bus 73, 1000 Brussel
>>     www.inbo.be <http://www.inbo.be>
>>
>>     ///////////////////////////////////////////////////////////////////////////////////////////
>>     To call in the statistician after the experiment is done may be
>>     no more than asking him to perform a post-mortem examination: he
>>     may be able to say what the experiment died of. ~ Sir Ronald
>>     Aylmer Fisher
>>     The plural of anecdote is not data. ~ Roger Brinner
>>     The combination of some data and an aching desire for an answer
>>     does not ensure that a reasonable answer can be extracted from a
>>     given body of data. ~ John Tukey
>>     ///////////////////////////////////////////////////////////////////////////////////////////
>>
>>     <https://www.inbo.be>
>>
>>
>>     Op di 14 jul. 2020 om 18:22 schreef Francois Rousset
>>     <francois.rousset using umontpellier.fr
>>     <mailto:francois.rousset using umontpellier.fr>>:
>>
>>         Dear Sarah,
>>
>>         Le 14/07/2020 à 16:55, Sarah Chisholm a écrit :
>>         > Hi Mollie, thank you for your suggestion. glmmTMB seems
>>         like a good
>>         > option for my needs as well. In your sample code above, can
>>         you
>>         > explain what the term 'group' does in matern(pos+0|group)?
>>         Does this
>>         > allow the spatial correlation structure to be applied to
>>         specific
>>         > groupings in the data (in my case, for example, by
>>         'continent')?
>>         >
>>         > Francois, thank you for this very clear answer. This is a very
>>         > convenient feature of the function! May I ask you a couple
>>         of other
>>         > questions about some issues that I've had with spaMM::fitme()?
>>         >
>>         > In particular, when I try fitting this model to a large
>>         data set (~14
>>         > 000 rows x 7 columns, ~2 MB), the model will run for an
>>         extended
>>         > period of time, to the point where I've had to terminate the
>>         > computation. I've tried applying the suggestions that are
>>         mentioned in
>>         > the user guide, i.e. setting init=list(lambda=0.1)
>>         > and init=list(lambda=NaN). Implementing
>>         init=list(lambda=0.1) returned
>>         > an error suggesting that there was a lack of memory, while
>>         running the
>>         > model with init=list(lambda=NaN) also ran for an extended
>>         period of
>>         > time without completing. Is there something else I can do
>>         to speed up
>>         > the fit of these models?
>>         >
>>         > I've had a similar problem with an even larger data set
>>         (~185 000 rows
>>         > x 8 columns, ~21 MB), where, when I try running the model,
>>         this error
>>         > is returned immediately:
>>         >
>>         > ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file
>>         > ../Core/cholmod_dense.c,line 105
>>         >
>>         > I've tried running this model on two devices, both with a
>>         64-bit OS
>>         > with Windows 10, one with 32 GB of RAM and the other with
>>         64 GB. I've
>>         > gotten the same error from both devices. Is there a way
>>         that fitme()
>>         > can accommodate these large data sets?
>>
>>         spaMM can handle large data sets, but the first issue to
>>         consider here
>>         is the number of distinct locations for the spatial random
>>         effect. The
>>         large correlation matrices of geostatistical models will
>>         always be a
>>         problem, both in terms of memory requirements and of
>>         potentially huge
>>         computation times. My guess from past experiments is that one
>>         should
>>         still be able to fit models with ~ 10K locations within a few
>>         days on a
>>         computer with <60 Gb of RAM (given perhaps some tinkering of the
>>         arguments), so at least the data set of 14 000 rows should be
>>         feasible,
>>         particularly if the number of locations is smaller.
>>
>>         Anyone planning to analyze large spatial data sets should
>>         anticipate
>>         these problems and check by themselves whether there is any
>>         practical
>>         alternative suitable for their particular problem. The
>>         discussion in
>>         section 6.2 of the "gentle introduction" to spaMM may then be
>>         useful.
>>
>>         Best,
>>
>>         F.
>>
>>         >
>>         > Thank you,
>>         >
>>         > Sarah
>>
>>                 [[alternative HTML version deleted]]
>>
>>         _______________________________________________
>>         R-sig-mixed-models using r-project.org
>>         <mailto:R-sig-mixed-models using r-project.org> mailing list
>>         https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>

	[[alternative HTML version deleted]]