[R-sig-ME] spaMM::fitme() - a glmm for longitudinal data that accounts for spatial autocorrelation

Thierry Onkelinx th|erry@onke||nx @end|ng |rom |nbo@be
Wed Jul 15 12:50:15 CEST 2020


Dear François,

Here you go:
https://drive.google.com/drive/folders/1Ocq88Yq9u_lM-loayRQlMyBS2HLy_Tio
Almost 30K locations. Fit in little over 7 min on my laptop with 16 GB RAM.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>


Op wo 15 jul. 2020 om 00:10 schreef Francois Rousset <
francois.rousset using umontpellier.fr>:

> Dear Thierry,
>
> please provide a reproducible example so that we know what you have
> actually done.
>
> Best,
>
> F.
> Le 14/07/2020 à 20:00, Thierry Onkelinx a écrit :
>
> Dear François and Sarah,
>
> INLA seems more efficient. I ran a model with Mattern correlation
> structure on 13K locations (1 observation per location) in under 10 minutes
> on a laptop with 16GB RAM.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op di 14 jul. 2020 om 18:22 schreef Francois Rousset <
> francois.rousset using umontpellier.fr>:
>
>> Dear Sarah,
>>
>> Le 14/07/2020 à 16:55, Sarah Chisholm a écrit :
>> > Hi Mollie, thank you for your suggestion. glmmTMB seems like a good
>> > option for my needs as well. In your sample code above, can you
>> > explain what the term 'group' does in matern(pos+0|group)? Does this
>> > allow the spatial correlation structure to be applied to specific
>> > groupings in the data (in my case, for example, by 'continent')?
>> >
>> > Francois, thank you for this very clear answer. This is a very
>> > convenient feature of the function! May I ask you a couple of other
>> > questions about some issues that I've had with spaMM::fitme()?
>> >
>> > In particular, when I try fitting this model to a large data set (~14
>> > 000 rows x 7 columns, ~2 MB), the model will run for an extended
>> > period of time, to the point where I've had to terminate the
>> > computation. I've tried applying the suggestions that are mentioned in
>> > the user guide, i.e. setting init=list(lambda=0.1)
>> > and init=list(lambda=NaN). Implementing init=list(lambda=0.1) returned
>> > an error suggesting that there was a lack of memory, while running the
>> > model with init=list(lambda=NaN) also ran for an extended period of
>> > time without completing. Is there something else I can do to speed up
>> > the fit of these models?
>> >
>> > I've had a similar problem with an even larger data set (~185 000 rows
>> > x 8 columns, ~21 MB), where, when I try running the model, this error
>> > is returned immediately:
>> >
>> > ErrorinZA %*%xmatrix :Cholmoderror 'problem too large'at file
>> > ../Core/cholmod_dense.c,line 105
>> >
>> > I've tried running this model on two devices, both with a 64-bit OS
>> > with Windows 10, one with 32 GB of RAM and the other with 64 GB. I've
>> > gotten the same error from both devices. Is there a way that fitme()
>> > can accommodate these large data sets?
>>
>> spaMM can handle large data sets, but the first issue to consider here
>> is the number of distinct locations for the spatial random effect. The
>> large correlation matrices of geostatistical models will always be a
>> problem, both in terms of memory requirements and of potentially huge
>> computation times. My guess from past experiments is that one should
>> still be able to fit models with ~ 10K locations within a few days on a
>> computer with <60 Gb of RAM (given perhaps some tinkering of the
>> arguments), so at least the data set of 14 000 rows should be feasible,
>> particularly if the number of locations is smaller.
>>
>> Anyone planning to analyze large spatial data sets should anticipate
>> these problems and check by themselves whether there is any practical
>> alternative suitable for their particular problem. The discussion in
>> section 6.2 of the "gentle introduction" to spaMM may then be useful.
>>
>> Best,
>>
>> F.
>>
>> >
>> > Thank you,
>> >
>> > Sarah
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list