[Rd] Memory limitations for parallel::mclapply
jgbradley1 at gmail.com
Fri Jul 24 22:21:03 CEST 2015
I have been having issues using parallel::mclapply in a memory-efficient
way and would like some guidance. I am using a 40 core machine with 96 GB
of RAM. I've tried to run mclapply with 20, 30, and 40 mc.cores and it has
practically brought the machine to a standstill each time to the point
where I do a hard reset.
When running mclapply with 10 mc.cores, I can see that each process takes
7.4% (~7 GB) of memory. My use-case for mclapply is the following: run
mclapply over a list of 150000 names, for each process I refer to a larger
pre-computed data.table to compute some stats with the name, and return
those stats . Ideally I want to use the large data.table as shared-memory
but the number of mc.cores I can use are being limited because each one
requires 7 GB. Someone posted this exact same issue
stackoverflow a couple years ago but it never got answered.
Do I have to manually tell mclapply to use shared memory (if so, how?)? Is
this type of job better with the doParallel package and foreach approach?
[[alternative HTML version deleted]]
More information about the R-devel