[Bioc-devel] bplapply Processes Sometimes Stall

Morgan, Martin Martin.Morgan at roswellpark.org
Mon Jan 4 03:07:57 CET 2016


Hi Dario -- the most likely explanation, without a reproducible example, is that the code used on workers sometimes puts R into a state that it cannot recover from.

The first approach to debug this is to run the code serially, e.g., using SerialParam and perhaps register(SerialParam()) (to make serial evaluation the default in a bplapply() invoked without a BPPARAM argument).

BiocParallel 1.5.12 is from the 'devel' branch of Bioconductor, which is supposed to be used (currently) on R-devel; please always use the appropriate version of R, with packages installed using biocLIte() when reporting problems.

Probably this belongs on support.bioconductor.org, where others may more easily benefit from your experience.

There are a couple of things that have come up while looking in to your problem and how R can get into the situation where several processes share a socket connection in the CLOSE_WAIT state; I'm still exploring solutions but it is not obvious that these would address whatever your underlying issue might be; R might be more helpful in saying that something has gone wrong, without being able to say exactly what.

Martin
________________________________________
From: Bioc-devel [bioc-devel-bounces at r-project.org] on behalf of Dario Strbenac [dstr7320 at uni.sydney.edu.au]
Sent: Friday, January 01, 2016 9:00 PM
To: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] bplapply Processes Sometimes Stall

Good day,

I haven't been able to make a small and reproducible example, but I am using bpstart and bpstop to run a loop with 25 workers multiple times on a large bioinformatics dataset. After a few times of running the loop successfully, a small number of the R workers use 100% CPU endlessly :

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3300 dario     20   0 1190832 837212  17988 R 100.0  0.2   3848:00 R
 5014 dario     20   0 1194528 829084   8224 R  99.8  0.2   3843:44 R
 5015 dario     20   0 1194532 829088   8224 R  99.8  0.2   3843:44 R

There are also three connections belonging to the R processes waiting to close :

~$ lsof -i | grep CLOSE
R          3300 dario 1025u  IPv4 160778259      0t0  TCP localhost:11881->localhost:49379 (CLOSE_WAIT)
R          5014 dario 1025u  IPv4 160778259      0t0  TCP localhost:11881->localhost:49379 (CLOSE_WAIT)
R          5015 dario 1025u  IPv4 160778259      0t0  TCP localhost:11881->localhost:49379 (CLOSE_WAIT)

~$ lsof -i | grep -c R
256

I use :

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

with BiocParallel 1.5.12

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.


More information about the Bioc-devel mailing list