[R] Thread parallelism and memory management on shared-memory supercomputers

Wed Dec 30 19:44:17 CET 2015

I'm not really an expert, but here are my 2 cents:

To the best of my limited knowlede, there is no direct way of ensuring
that the total memory being requested by N workers remains below a
certain threshold. You can control the number of child processes
forked by foreach/doPar in the registerDoParallel call using argument
'cores'. The parallel computation implemented in parallel and
foreach/doPar uses process forking (at least last time I checked it
did). When a process is forked, the entire memory of its parent is
"forked" as well (not sure what the right terms is). This does not
mean a real copy (modern systems use copy-on-write), but for the OS
memory management purposes each child occupies as much memory as the
parent.

If you want to benchmark your memory usage, run a single (non-forked)
process and at the end, look at the output of gc() which gives you,
among other things, maximum memory usage. For a more detailed
information on memory usage, you can run Rprof, tracemem, or Rprofmem,
see their help for details.

To decrease memory usage, you will have to optimize your code and
perhaps sprinkle in garbage collection (gc()) calls after large object
manipulations. Just be aware that garbage collection is rather slow,
so you don't want to do it too often.

The difference between the cluster and your laptop may be that on the
laptop the system doesn't care so much about how much memory each
child uses, so you can fork a process with a large memory footprint as
long as you don't cause copying by modifying large chunks of memory.

HTH,

Peter

On Wed, Dec 30, 2015 at 9:36 AM, Andrew Crane-Droesch
<andrewcd at gmail.com> wrote:
> I've got allocations on a couple of shared memory supercomputers, which I
> use to run computationally-intensive scripts on multiple cores of the same
> node.  I've got 24 cores on the one, and 48 on the other.
>
> In both cases, there is a hard memory limit, which is shared among the cores
> in the node.  In the latter, the limit is 255G. If my job requests more than
> that, the job gets aborted.
>
> Now, I don't fully understand resource allocation in these sorts of systems.
> But I do get that the sort of "thread parallelism" done by e.g. the
> `parallel` package in R isn't identical to the sort of parallelism commonly
> done in lower-level languages.  For example, when I request a node, I only
> ask for one of its cores.  My R script then detects the number of cores on
> the node, and farms out tasks to the cores via the `foreach` package.  My
> understanding is that lower-level languages need the number of cores to be
> specified in the shell script, and a particular job script is given directly
> to each worker.
>
> My problem is that my parallel-calling R script is crashing the cluster,
> which terminates my script because the sum of the memory being requested by
> each thread is greater than what I'm allocated. I don't get this problem
> when running on my laptop's 4 cores, presumably because my laptop has a
> higher ratio of memory/core.
>
> My question:  how can I ensure that the total memory being requested by N
> workers remains below a certain threshold?  Is this even possible?  If not,
> is it possible to benchmark a process locally, collecting the maximum
> per-worker memory requested, and use this to back out the number of workers
> that I can request for a given node's memory limit?
>
> Thanks in advance!
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.