[R-sig-ME] Pulling specific parameters from models to prevent exhausting memory.

Cesko Voeten c@c@voeten @end|ng |rom hum@|e|denun|v@n|
Thu Nov 5 18:06:57 CET 2020


Hi James,

You indeed need to specify the interaction a bit differently: just change the colon to a comma, so that you are describing a random intercept over multiple arguments simultaneously.
If you're uncertain, you can try using buildmer::re2mgcv to automatically convert your lmer formula to an mgcv formula. (But not sure if it is capable of handling interactions between grouping terms -- it's been a while since I've delved into its code.)

One more thing: with your current code, bam will fit the model using REML, while your lmer model uses ML. If you want your bam model to use ML as well, you should pass method='ML'. However, if you don't care, method='fREML' is preferred as it is faster, and only with method='fREML' you can also pass discrete=TRUE, which will result in a significant speedup and much lower memory usage with very large datasets like yours.

Hope this helps,

Cesko

Op 4-11-2020 om 21:46 schreef Ades, James:
> Hi all,
> 
> Just following up on this. I've been reading up on GAMs and the bam function, and I think I have the model correctly specified except for one random effect interaction component, for which I am not certain how to specify within the "mgcv" context.
> 
> This is the model I am trying to specify for mgcv in an lme4 framework:
> 
> lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) + (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control = lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE, data)
> 
> This is what I have, however the "subjectID: roi" interaction does not seem to be correctly specified because in the results, the first random effect is the same as the second. The first two terms are parametric (but perhaps will need to receive some kind of spline depending), then the latter two are random effects.
> bam(connectivity ~ roi * timepoint + s(timepoint.nu, subjectID, bs = "re") + s(timepoint.nu, subjectID:roi), bs = "re"), data = tot.add.1, method = "fREML")
> 
> Thanks much!
> 
> James

> *From:* Voeten, C.C. <c.c.voeten using hum.leidenuniv.nl>
> *Sent:* Sunday, October 18, 2020 1:16 AM
> *To:* Ades, James <jades using health.ucsd.edu>; r-sig-mixed-models using r-project.org <r-sig-mixed-models using r-project.org>
> *Subject:* RE: Pulling specific parameters from models to prevent exhausting memory.
> Hi James,
> 
> You may have luck using mgcv::bam instead of lme4. It can also fit random-slopes models and is optimized for "big data", in terms of memory usage and computational efficiency. The modeling syntax is slightly different, though; the correct translation of lme4 random effects into mgcv's s(...,bs='re') terms depends on whether timepoint.nu is a covariate or a factor.
> 
> HTH,
> Cesko
> 
>> -----Original Message-----
>> From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> On
>> Behalf Of Ades, James
>> Sent: Sunday, October 18, 2020 2:01 AM
>> To: r-sig-mixed-models using r-project.org
>> Subject: [R-sig-ME] Pulling specific parameters from models to prevent
>> exhausting memory.
>> 
>> Hi all,
>> 
>> I'm modeling fMRI imaging data using lme4. There are 4 time points and
>> roughly 550 subjects with 27,730 regions of interest (these are the variables).
>> Since I have access to a super computer, my thought was to create a long
>> dataset with a repeated measures of regions of interest per time point and
>> then subjects over the 4 time points. I'm using the model below. I gather the
>> regions of interest using the super computer because it ends up being
>> roughly 70 million something observations. Timepoint is discrete and
>> timepoint.nu is just numerical time point.
>> 
>> lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) +
>> (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control =
>> lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE,
>> data)
>> 
>> I received back the following error: "cannot allocate vector of size 30206.2
>> GbExecution halted"
>> 
>> So I'm wondering how I can only pull the essential parameters I need (group
>> means vs individual fixed effects) while modeling, such that the super
>> computer can finish the job without exhausting the memory. I say group
>> means because I will eventually be adding in covariates.
>> 
>> Also, the super computer rules are that the job must finish within two days.
>> I'm not sure that this would, so I'm wondering whether there is any way to
>> parallel code in lme4 such that I could make access of multiple cores and
>> nodes.
>> 
>> I've included a slice of data here:
>> https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb- <https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb->
>> 8L/view?usp=sharing
>> 
>> Thanks much,
>> 
>> James
>> 
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>


More information about the R-sig-mixed-models mailing list