[Rd] Listing all spawned jobs/processed after parallel::mcparallel()?

Henrik Bengtsson henrik.bengtsson at ucsf.edu
Wed Jun 24 05:47:23 CEST 2015


On Sun, Jun 21, 2015 at 9:59 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> On 20/06/2015 22:21, Henrik Bengtsson wrote:
>>
>> QUESTION:
>> Is it possible to query number of active jobs running after launching
>> them with parallel::mcparallel()?
>>
>> For example, if I launch 3 jobs using:
>>
>>> library(parallel)
>>> f <- lapply(1:3, FUN=mcparallel)
>>
>>
>> then I can inspect them as:
>>
>>> str(f)
>>
>> List of 3
>>   $ :List of 2
>>    ..$ pid: int 142225
>>    ..$ fd : int [1:2] 8 13
>>    ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process"
>>   $ :List of 2
>>    ..$ pid: int 142226
>>    ..$ fd : int [1:2] 10 15
>>    ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process"
>>   $ :List of 2
>>    ..$ pid: int 142227
>>    ..$ fd : int [1:2] 12 17
>>    ..- attr(*, "class")= chr [1:3] "parallelJob" "childProcess" "process"
>>
>> However, if I launch them without "recording" them, or equivalently if I
>> do:
>>
>>> f <- lapply(1:3, FUN=mcparallel)
>>> rm(list="f")
>>
>>
>> is there a function/mechanism in R/the parallel package allowing me to
>> find the currently active/running processes?  ... or at least query
>> how many they are?  I'd like to use this to prevent spawning of more
>> than a maximum number of parallel processes.  (Yes, I'm away of
>> mclapply() and friends, but I'm looking at using more low-level
>> mcparallel()/mccollect()). I'm trying to decide whether I should
>> implement my own mechanism for keeping track of "jobs" or not.
>
>
> Note that 'currently active/running' is a slippery concept and is not what
> the results above show.  But see ?children, which seems to be what you are
> looking for.  It is not exported and there is no more detailed explanation
> save the source code.  Also note that tells you about children and not
> grandchildren ....
>
> You can find out about child processes (and their children) at OS level, for
> example via the 'ps' command, but doing so portably is not easy.

Thank you very much.  This was exactly what I was looking for.  I
appreciate the problem of identifying grandchildren, but with
children() I know at least have chance to get to a lower bound of the
number of "active children" (?children).

After some initial testing on Linux and OSX, I'm glad to see that
parallel:::children() seem to reflect what are actually active
processes, e.g. if I SIGTERM one of them externally, it is immediately
dropped from parallel:::children().  I also noticed that the process
remains active until it has been parallel:::mccollect():ed.

/Henrik

>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
> 1 South Parks Road, Oxford OX1 3TG, UK



More information about the R-devel mailing list