[R-sig-ME] Pulling specific parameters from models to prevent exhausting memory.

Tue Dec 15 23:40:20 CET 2020

Hi James,

I missed this discussion early on, but Cesko's suggestion is pretty
good. Your problem is also the type of thing that Benedikt Ehinger, Dave
Kleinschmidt and I have been tinkering around with in Julia using
Benedikt's unfold.jl toolbox, which in turn uses MixedModels.jl for
fitting the models. As discussed in other threads on this list recently,
we have a few tricks up our sleeves on the Julia side that allows us to
be a bit more efficient in memory than lme4 when fitting the model.
Using my JellyMe4 package, it's also possible to convert the Julia fit
back to an lme4 fit, if you want to do that to take advantage of all the
excellent tooling around lme4 in R.

I decided to take a quick stab at your model and found a few things:

1. Your data were still in wide format; maybe mention that next time ;)
2. The data only had one timepoint, so I couldn't test them at scale
3. ROI should realistically be a grouping variable and not a categorical
fixed effect for these data -- it seems that you have tens of thousands
of levels (which isn't surprising for fMRI).
4. If you have some sparsity in the fixed effects, that may be something
we can take advantage of, but that wasn't completely clear to me in the
5 minutes I spent on this.

My quick Julia attempt can be found here:

https://github.com/palday/ades-fmri-lmm

Best,
Phillip

On 18/10/20 2:00 am, Ades, James wrote:
> Hi all,
> 
> I'm modeling fMRI imaging data using lme4. There are 4 time points and roughly 550 subjects with 27,730 regions of interest (these are the variables). Since I have access to a super computer, my thought was to create a long dataset with a repeated measures of regions of interest per time point and then subjects over the 4 time points. I'm using the model below. I gather the regions of interest using the super computer because it ends up being roughly 70 million something observations. Timepoint is discrete and timepoint.nu is just numerical time point.
> 
> lmer(connectivity ~ roi * timepoint + (timepoint.nu|subjectID) + (timepoint.nu|subjectID:roi), na.action = 'na.exclude', control = lmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE, data)
> 
> I received back the following error: "cannot allocate vector of size 30206.2 GbExecution halted"
> 
> So I'm wondering how I can only pull the essential parameters I need (group means vs individual fixed effects) while modeling, such that the super computer can finish the job without exhausting the memory. I say group means because I will eventually be adding in covariates.
> 
> Also, the super computer rules are that the job must finish within two days. I'm not sure that this would, so I'm wondering whether there is any way to parallel code in lme4 such that I could make access of multiple cores and nodes.
> 
> I've included a slice of data here: https://drive.google.com/file/d/1mhTj6qZZ2nT35fXUuYG_ThQ-QtWbb-8L/view?usp=sharing
> 
> Thanks much,
> 
> James
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>