[R-sig-ME] spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation

Sarah Chisholm @ch|@023 @end|ng |rom uott@w@@c@
Wed Jul 15 16:48:38 CEST 2020


Thanks Francois. I hadn't considered that the number of unique locations
could be the source of the problem, rather than the size of the entire data
set. It is a possibility for me to simply remove observations for a number
of locations to bring the total sample size (of unique coordinates) down.
I'll also test a lattice model using the IMRF() notation to describe the
random spatial effect - I believe this is what you referred to in your
previous email?

Sarah

On Wed, Jul 15, 2020 at 10:01 AM Francois Rousset <
francois.rousset using umontpellier.fr> wrote:

> Dear Thierry,
>
> thanks. So (expectedly) this is a different issue. spaMM can fit some
> correlation models described by objects produced by
> INLA::inla.spde2.matern() and then, in my past experiments, the computation
> times were close to those of INLA, and the memory requirements were much
> smaller than what I wrote previously where this is not what I meant by
> "Matern".
>
> Beyond general features that contribute to these computational differences
> (the use of sparse matrix methods, and to a lesser extent the constraint on
> the smoothness parameter of the approximated Matern model), the 'cutoff'
> argument in your call to inla.mesh.2d() appears important to reduce the
> number  of locations actually considered, in the most costly computations,
> below the number of locations in the data (to 8804 rather than 30K, if I
> get it right), and this would also allow a faster fit by spaMM when called
> on the resulting inla.spde2 object.
>
> Best,
>
> F.
> Le 15/07/2020 à 12:50, Thierry Onkelinx a écrit :
>
> Dear François,
>
> Here you go:
> https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio
> Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB RAM.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset <
> francois.rousset using umontpellier.fr>:
>
>> Dear Thierry,
>>
>> please provide a reproducible example so that we know what you have
>> actually done.
>>
>> Best,
>>
>> F.
>> Le 14/07/2020 à 20:00, Thierry Onkelinx a écrit :
>>
>> Dear François and Sarah,
>>
>> INLA seems more efficient. I ran a model with Mattern correlation
>> structure on 13K locations (1 observation per location) in under 10 minutes
>> on a laptop with 16GB RAM.
>>
>> Best regards,
>>
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>>
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>> AND FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>> thierry.onkelinx using inbo.be
>> Havenlaan 88 bus 73, 1000 Brussel
>> www.inbo.be
>>
>>
>> ///////////////////////////////////////////////////////////////////////////////////////////
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>>
>> ///////////////////////////////////////////////////////////////////////////////////////////
>>
>> <https://www.inbo.be>
>>
>>
>> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset <
>> francois.rousset using umontpellier.fr>:
>>
>>> Dear Sarah,
>>>
>>> Le 14/07/2020 à 16:55, Sarah Chisholm a écrit :
>>> > Hi Mollie, thank you for your suggestion. glmmTMB seems like a good
>>> > option for my needs as well. In your sample code above, can you
>>> > explain what the term 'group' does in matern(pos+0|group)? Does this
>>> > allow the spatial correlation structure to be applied to specific
>>> > groupings in the data (in my case, for example, by 'continent')?
>>> >
>>> > Francois, thank you for this very clear answer. This is a very
>>> > convenient feature of the function! May I ask you a couple of other
>>> > questions about some issues that I've had with spaMM::fitme()?
>>> >
>>> > In particular, when I try fitting this model to a large data set (~14
>>> > 000 rows x 7 columns, ~2 MB), the model will run for an extended
>>> > period of time, to the point where I've had to terminate the
>>> > computation. I've tried applying the suggestions that are mentioned in
>>> > the user guide, i.e. setting init=list(lambda=0.1)
>>> > and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned
>>> > an error suggesting that there was a lack of memory, while running the
>>> > model with init=list(lambda=NaN) also ran for an extended period of
>>> > time without completing. Is there something else I can do to speed up
>>> > the fit of these models?
>>> >
>>> > I've had a similar problem with an even larger data set (~185 000 rows
>>> > x 8 columns, ~21 MB), where, when I try running the model, this error
>>> > is returned immediately:
>>> >
>>> > ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file
>>> > ../Core/cholmod_dense.c,line 105
>>> >
>>> > I've tried running this model on two devices, both with a 64-bit OS
>>> > with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've
>>> > gotten the same error from both devices. Is there a way that fitme()
>>> > can accommodate these large data sets?
>>>
>>> spaMM can handle large data sets, but the first issue to consider here
>>> is the number of distinct locations for the spatial random effect. The
>>> large correlation matrices of geostatistical models will always be a
>>> problem, both in terms of memory requirements and of potentially huge
>>> computation times. My guess from past experiments is that one should
>>> still be able to fit models with ~ 10K locations within a few days on a
>>> computer with <60 Gb of RAM (given perhaps some tinkering of the
>>> arguments), so at least the data set of 14 000 rows should be feasible,
>>> particularly if the number of locations is smaller.
>>>
>>> Anyone planning to analyze large spatial data sets should anticipate
>>> these problems and check by themselves whether there is any practical
>>> alternative suitable for their particular problem. The discussion in
>>> section 6.2 of the "gentle introduction" to spaMM may then be useful.
>>>
>>> Best,
>>>
>>> F.
>>>
>>> >
>>> > Thank you,
>>> >
>>> > Sarah
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>

-- 
Sarah Chisholm
MSc Candidate
Department of Biology
University of Ottawa
Linkedin <http://www.linkedin.com/in/sarah-chisholm-422a5785>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list