[R-sig-ME] spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation

Wed Jul 15 00:10:26 CEST 2020

Dear Thierry,

please provide a reproducible example so that we know what you have 
actually done.

Best,

F.

Le 14/07/2020 à 20:00, Thierry Onkelinx a écrit :
> Dear François and Sarah,
>
> INLA seems more efficient. I ran a model with Mattern correlation 
> structure on 13K locations (1 observation per location) in under 10 
> minutes on a laptop with 16GB RAM.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE 
> AND FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be <http://www.inbo.be>
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no 
> more than asking him to perform a post-mortem examination: he may be 
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does 
> not ensure that a reasonable answer can be extracted from a given body 
> of data. ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset 
> <francois.rousset using umontpellier.fr 
> <mailto:francois.rousset using umontpellier.fr>>:
>
>     Dear Sarah,
>
>     Le 14/07/2020 à 16:55, Sarah Chisholm a écrit :
>     > Hi Mollie, thank you for your suggestion. glmmTMB seems like a good
>     > option for my needs as well. In your sample code above, can you
>     > explain what the term 'group' does in matern(pos+0|group)? Does
>     this
>     > allow the spatial correlation structure to be applied to specific
>     > groupings in the data (in my case, for example, by 'continent')?
>     >
>     > Francois, thank you for this very clear answer. This is a very
>     > convenient feature of the function! May I ask you a couple of other
>     > questions about some issues that I've had with spaMM::fitme()?
>     >
>     > In particular, when I try fitting this model to a large data set
>     (~14
>     > 000 rows x 7 columns, ~2 MB), the model will run for an extended
>     > period of time, to the point where I've had to terminate the
>     > computation. I've tried applying the suggestions that are
>     mentioned in
>     > the user guide, i.e. setting init=list(lambda=0.1)
>     > and init=list(lambda=NaN). Implementing init=list(lambda=0.1)
>     returned
>     > an error suggesting that there was a lack of memory, while
>     running the
>     > model with init=list(lambda=NaN) also ran for an extended period of
>     > time without completing. Is there something else I can do to
>     speed up
>     > the fit of these models?
>     >
>     > I've had a similar problem with an even larger data set (~185
>     000 rows
>     > x 8 columns, ~21 MB), where, when I try running the model, this
>     error
>     > is returned immediately:
>     >
>     > ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file
>     > ../Core/cholmod_dense.c,line 105
>     >
>     > I've tried running this model on two devices, both with a 64-bit OS
>     > with Windows 10, one with 32 GB of RAM and the other with 64 GB.
>     I've
>     > gotten the same error from both devices. Is there a way that
>     fitme()
>     > can accommodate these large data sets?
>
>     spaMM can handle large data sets, but the first issue to consider
>     here
>     is the number of distinct locations for the spatial random effect.
>     The
>     large correlation matrices of geostatistical models will always be a
>     problem, both in terms of memory requirements and of potentially huge
>     computation times. My guess from past experiments is that one should
>     still be able to fit models with ~ 10K locations within a few days
>     on a
>     computer with <60 Gb of RAM (given perhaps some tinkering of the
>     arguments), so at least the data set of 14 000 rows should be
>     feasible,
>     particularly if the number of locations is smaller.
>
>     Anyone planning to analyze large spatial data sets should anticipate
>     these problems and check by themselves whether there is any practical
>     alternative suitable for their particular problem. The discussion in
>     section 6.2 of the "gentle introduction" to spaMM may then be useful.
>
>     Best,
>
>     F.
>
>     >
>     > Thank you,
>     >
>     > Sarah
>
>             [[alternative HTML version deleted]]
>
>     _______________________________________________
>     R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]