[R-sig-ME] spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation
Francois Rousset
|r@nco|@@rou@@et @end|ng |rom umontpe|||er@|r
Thu Jul 16 10:19:07 CEST 2020
Le 15/07/2020 à 16:48, Sarah Chisholm a écrit :
> Thanks Francois. I hadn't considered that the number of unique
> locations could be the source of the problem, rather than the size of
> the entire data set. It is a possibility for me to simply remove
> observations for a number of locations to bring the total sample size
> (of unique coordinates) down. I'll also test a lattice model using the
> IMRF() notation to describe the random spatial effect - I believe this
> is what you referred to in your previous email?
yes, use the IMRF formula term for this purpose.
F.
>
> Sarah
>
> On Wed, Jul 15, 2020 at 10:01 AM Francois Rousset
> <francois.rousset using umontpellier.fr
> <mailto:francois.rousset using umontpellier.fr>> wrote:
>
> Dear Thierry,
>
> thanks. So (expectedly) this is a different issue. spaMM can fit
> some correlation models described by objects produced by
> INLA::inla.spde2.matern() and then, in my past experiments, the
> computation times were close to those of INLA, and the memory
> requirements were much smaller than what I wrote previously where
> this is not what I meant by "Matern".
>
> Beyond general features that contribute to these computational
> differences (the use of sparse matrix methods, and to a lesser
> extent the constraint on the smoothness parameter of the
> approximated Matern model), the 'cutoff' argument in your call to
> inla.mesh.2d() appears important to reduce the number of
> locations actually considered, in the most costly computations,
> below the number of locations in the data (to 8804 rather than
> 30K, if I get it right), and this would also allow a faster fit by
> spaMM when called on the resulting inla.spde2 object.
>
> Best,
>
> F.
>
> Le 15/07/2020 à 12:50, Thierry Onkelinx a écrit :
>> Dear François,
>>
>> Here you go:
>> https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio
>> Almost 30K locations. Fit in little over 7 min on my laptop with
>> 16 GB RAM.
>>
>> Best regards,
>>
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>>
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR
>> NATURE AND FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality
>> Assurance
>> thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
>> Havenlaan 88 bus 73, 1000 Brussel
>> www.inbo.be <http://www.inbo.be>
>>
>> ///////////////////////////////////////////////////////////////////////////////////////////
>> To call in the statistician after the experiment is done may be
>> no more than asking him to perform a post-mortem examination: he
>> may be able to say what the experiment died of. ~ Sir Ronald
>> Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer
>> does not ensure that a reasonable answer can be extracted from a
>> given body of data. ~ John Tukey
>> ///////////////////////////////////////////////////////////////////////////////////////////
>>
>> <https://www.inbo.be>
>>
>>
>> Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset
>> <francois.rousset using umontpellier.fr
>> <mailto:francois.rousset using umontpellier.fr>>:
>>
>> Dear Thierry,
>>
>> please provide a reproducible example so that we know what
>> you have actually done.
>>
>> Best,
>>
>> F.
>>
>> Le 14/07/2020 à 20:00, Thierry Onkelinx a écrit :
>>> Dear François and Sarah,
>>>
>>> INLA seems more efficient. I ran a model with Mattern
>>> correlation structure on 13K locations (1 observation per
>>> location) in under 10 minutes on a laptop with 16GB RAM.
>>>
>>> Best regards,
>>>
>>> ir. Thierry Onkelinx
>>> Statisticus / Statistician
>>>
>>> Vlaamse Overheid / Government of Flanders
>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE
>>> FOR NATURE AND FOREST
>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality
>>> Assurance
>>> thierry.onkelinx using inbo.be <mailto:thierry.onkelinx using inbo.be>
>>> Havenlaan 88 bus 73, 1000 Brussel
>>> www.inbo.be <http://www.inbo.be>
>>>
>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>> To call in the statistician after the experiment is done may
>>> be no more than asking him to perform a post-mortem
>>> examination: he may be able to say what the experiment died
>>> of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an
>>> answer does not ensure that a reasonable answer can be
>>> extracted from a given body of data. ~ John Tukey
>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>
>>> <https://www.inbo.be>
>>>
>>>
>>> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset
>>> <francois.rousset using umontpellier.fr
>>> <mailto:francois.rousset using umontpellier.fr>>:
>>>
>>> Dear Sarah,
>>>
>>> Le 14/07/2020 à 16:55, Sarah Chisholm a écrit :
>>> > Hi Mollie, thank you for your suggestion. glmmTMB
>>> seems like a good
>>> > option for my needs as well. In your sample code
>>> above, can you
>>> > explain what the term 'group' does in
>>> matern(pos+0|group)? Does this
>>> > allow the spatial correlation structure to be applied
>>> to specific
>>> > groupings in the data (in my case, for example, by
>>> 'continent')?
>>> >
>>> > Francois, thank you for this very clear answer. This
>>> is a very
>>> > convenient feature of the function! May I ask you a
>>> couple of other
>>> > questions about some issues that I've had with
>>> spaMM::fitme()?
>>> >
>>> > In particular, when I try fitting this model to a
>>> large data set (~14
>>> > 000 rows x 7 columns, ~2 MB), the model will run for
>>> an extended
>>> > period of time, to the point where I've had to
>>> terminate the
>>> > computation. I've tried applying the suggestions that
>>> are mentioned in
>>> > the user guide, i.e. setting init=list(lambda=0.1)
>>> > and init=list(lambda=NaN). Implementing
>>> init=list(lambda=0.1) returned
>>> > an error suggesting that there was a lack of memory,
>>> while running the
>>> > model with init=list(lambda=NaN) also ran for an
>>> extended period of
>>> > time without completing. Is there something else I can
>>> do to speed up
>>> > the fit of these models?
>>> >
>>> > I've had a similar problem with an even larger data
>>> set (~185 000 rows
>>> > x 8 columns, ~21 MB), where, when I try running the
>>> model, this error
>>> > is returned immediately:
>>> >
>>> > ErrorinZA %*%xmatrix :Cholmoderror 'problem too
>>> large'at file
>>> > ../Core/cholmod_dense.c,line 105
>>> >
>>> > I've tried running this model on two devices, both
>>> with a 64-bit OS
>>> > with Windows 10, one with 32 GB of RAM and the other
>>> with 64 GB. I've
>>> > gotten the same error from both devices. Is there a
>>> way that fitme()
>>> > can accommodate these large data sets?
>>>
>>> spaMM can handle large data sets, but the first issue to
>>> consider here
>>> is the number of distinct locations for the spatial
>>> random effect. The
>>> large correlation matrices of geostatistical models will
>>> always be a
>>> problem, both in terms of memory requirements and of
>>> potentially huge
>>> computation times. My guess from past experiments is
>>> that one should
>>> still be able to fit models with ~ 10K locations within
>>> a few days on a
>>> computer with <60 Gb of RAM (given perhaps some
>>> tinkering of the
>>> arguments), so at least the data set of 14 000 rows
>>> should be feasible,
>>> particularly if the number of locations is smaller.
>>>
>>> Anyone planning to analyze large spatial data sets
>>> should anticipate
>>> these problems and check by themselves whether there is
>>> any practical
>>> alternative suitable for their particular problem. The
>>> discussion in
>>> section 6.2 of the "gentle introduction" to spaMM may
>>> then be useful.
>>>
>>> Best,
>>>
>>> F.
>>>
>>> >
>>> > Thank you,
>>> >
>>> > Sarah
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org
>>> <mailto:R-sig-mixed-models using r-project.org> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>
>
> --
> Sarah Chisholm
> MSc Candidate
> Department of Biology
> University of Ottawa
> Linkedin <http://www.linkedin.com/in/sarah-chisholm-422a5785>
More information about the R-sig-mixed-models
mailing list