[Bioc-devel] Progress Message Order in bplapply

Morgan, Martin Martin.Morgan at roswellpark.org
Mon Dec 28 18:36:01 CET 2015


Hi Dario -- it was this commit

------------------------------------------------------------------------
r111519 | mtmorgan at fhcrc.org | 2015-12-15 14:34:18 -0500 (Tue, 15 Dec 2015) | 2 lines

port: r111463, bugfix: workers=1, tasks=0 assigns all X to one chunk

------------------------------------------------------------------------

in response to this report

https://support.bioconductor.org/p/75945/

Previously, the behavior when the number of 'tasks' was unspecified (default value 0) was to split X (in your example, the vector 1:100) into 100 individual tasks 1, 2, 3, ..., and to process each in a completely independent parallel process -- there would be a total of 100 processes started and stopped. The change mentioned above instead behaves as documented, splitting the 100 elements approximately evenly between the specified number of workers (25), and sending several elements to each worker for processing. This saves the cost of communicating the object to and from the worker. You can get the old behavior by specifying tasks = length(X), for your example tasks=100. 

The 'split' of elements into tasks can be seen by calling the internal function .splitX()

> head(BiocParallel:::.splitX(1:100, 25, 100))  # 1 task per job
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

> head(BiocParallel:::.splitX(1:100, 25, 0))  # 4 tasks per job
[[1]]
[1] 1 2 3 4

[[2]]
[1] 5 6 7 8

[[3]]
[1]  9 10 11 12

[[4]]
[1] 13 14 15 16

[[5]]
[1] 17 18 19 20

[[6]]
[1] 21 22 23 24


Each element of the call to splitX is assigned in order, but the precise schedule is somewhat indeterminate -- task 1 might be assigned before task 2, but perhaps the process handling task 1 runs the garbage collector before sleeping so task 2 finishes ahead of task 1. Under the original scheme I guess you were relying on the average execution time of ten processes between each message, whereas in the correct scheme you are relying on the average execution time of just three processes so greater variability. Either way, though, the order of execution is not guaranteed.

Messages are reported at the end of each task; there are 100 opportunities for messages when the number of tasks is 100, but only 25 opportunities (corresponding approximately to each processor handling 4 elements) otherwise.

Other than being different from previously, is there an underlying problem?

Martin
________________________________________
From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of Dario Strbenac [dstr7320 at uni.sydney.edu.au]
Sent: Sunday, December 27, 2015 7:00 PM
To: bioc-devel list
Subject: [Bioc-devel] Progress Message Order in bplapply

Hello,

I am experiencing some new and unexpected behaviour of mclapply.

Previously, progress messages were displayed in almost the expected order. Now, they are unlike the original order. My test case is :

bplapply(1:100, function(x) {if(x %% 10 == 0) message(x); Sys.sleep(30)}, BPPARAM = MulticoreParam(workers = 25))

The resulting progress message aren't displayed until the end of the process, whereas before they appeared immediately. I would expect 10 and 20 to appear before 30 did.

30
40
50
10
20
60
70
80
90
100

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.


More information about the Bioc-devel mailing list