[R-sig-ME] Pulling specific parameters from models to prevent exhausting memory.

Thu Nov 5 19:48:19 CET 2020

By the way: it occurs to me that it may be statistically preferable if you use s(subject,bs='re',by=timepoint) and s(subject,roi,bs='re',by=timepoint). You say that timepoint is discrete, I'm presuming that it therefore is a non-ordered factor variable. The by=... formulation with a factor variable produces four separate random intercepts for your four timepoints; this is probably more appropriate than drawing a straight line through the four of them, which is what you are doing now (also in your lmer model).

> -----Original Message-----
> From: Cesko Voeten <c.c.voeten using hum.leidenuniv.nl>
> Sent: Thursday, November 5, 2020 6:07 PM
> To: Ades, James <jades using health.ucsd.edu>; r-sig-mixed-models using r-
> project.org
> Subject: Re: Pulling specific parameters from models to prevent exhausting
> memory.
> 
> Hi James,
> 
> You indeed need to specify the interaction a bit differently: just change the
> colon to a comma, so that you are describing a random intercept over
> multiple arguments simultaneously.
> If you're uncertain, you can try using buildmer::re2mgcv to automatically
> convert your lmer formula to an mgcv formula. (But not sure if it is capable of
> handling interactions between grouping terms -- it's been a while since I've
> delved into its code.)
> 
> One more thing: with your current code, bam will fit the model using REML,
> while your lmer model uses ML. If you want your bam model to use ML as
> well, you should pass method='ML'. However, if you don't care,
> method='fREML' is preferred as it is faster, and only with method='fREML'
> you can also pass discrete=TRUE, which will result in a significant speedup
> and much lower memory usage with very large datasets like yours.
> 
> Hope this helps,
> 
> Cesko
> 
> Op 4-11-2020 om 21:46 schreef Ades, James:
> > Hi all,
> >
> > Just following up on this. I've been reading up on GAMs and the bam
> function, and I think I have the model correctly specified except for one
> random effect interaction component, for which I am not certain how to
> specify within the "mgcv" context.
> >
> > This is the model I am trying to specify for mgcv in an lme4 framework:
> >
> > lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) +
> (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control =
> lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE,
> data)
> >
> > This is what I have, however the "subjectID: roi" interaction does not seem
> to be correctly specified because in the results, the first random effect is the
> same as the second. The first two terms are parametric (but perhaps will
> need to receive some kind of spline depending), then the latter two are
> random effects.
> > bam(connectivity ~ roi * timepoint + s(timepoint.nu, subjectID, bs = "re") +
> s(timepoint.nu, subjectID:roi), bs = "re"), data = tot.add.1, method =
> "fREML")
> >
> > Thanks much!
> >
> > James
> > -------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------
> -----------------------------------------------------
> > *From:* Voeten, C.C. <c.c.voeten using hum.leidenuniv.nl>
> > *Sent:* Sunday, October 18, 2020 1:16 AM
> > *To:* Ades, James <jades using health.ucsd.edu>; r-sig-mixed-models using r-
> project.org <r-sig-mixed-models using r-project.org>
> > *Subject:* RE: Pulling specific parameters from models to prevent
> exhausting memory.
> > Hi James,
> >
> > You may have luck using mgcv::bam instead of lme4. It can also fit random-
> slopes models and is optimized for "big data", in terms of memory usage and
> computational efficiency. The modeling syntax is slightly different, though;
> the correct translation of lme4 random effects into mgcv's s(...,bs='re') terms
> depends on whether timepoint.nu is a covariate or a factor.
> >
> > HTH,
> > Cesko
> >
> >> -----Original Message-----
> >> From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org>
> On
> >> Behalf Of Ades, James
> >> Sent: Sunday, October 18, 2020 2:01 AM
> >> To: r-sig-mixed-models using r-project.org
> >> Subject: [R-sig-ME] Pulling specific parameters from models to prevent
> >> exhausting memory.
> >>
> >> Hi all,
> >>
> >> I'm modeling fMRI imaging data using lme4. There are 4 time points and
> >> roughly 550 subjects with 27,730 regions of interest (these are the
> variables).
> >> Since I have access to a super computer, my thought was to create a long
> >> dataset with a repeated measures of regions of interest per time point
> and
> >> then subjects over the 4 time points. I'm using the model below. I gather
> the
> >> regions of interest using the super computer because it ends up being
> >> roughly 70 million something observations. Timepoint is discrete and
> >> timepoint.nu is just numerical time point.
> >>
> >> lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) +
> >> (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control =
> >> lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE,
> >> data)
> >>
> >> I received back the following error: "cannot allocate vector of size 30206.2
> >> GbExecution halted"
> >>
> >> So I'm wondering how I can only pull the essential parameters I need
> (group
> >> means vs individual fixed effects) while modeling, such that the super
> >> computer can finish the job without exhausting the memory. I say group
> >> means because I will eventually be adding in covariates.
> >>
> >> Also, the super computer rules are that the job must finish within two
> days.
> >> I'm not sure that this would, so I'm wondering whether there is any way
> to
> >> parallel code in lme4 such that I could make access of multiple cores and
> >> nodes.
> >>
> >> I've included a slice of data here:
> >> https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb-
> <https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb->
> >> 8L/view?usp=sharing
> >>
> >> Thanks much,
> >>
> >> James
> >>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-models using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>