[Bioc-devel] Rprintf in a multi-threaded environment

Martin Morgan mtmorg@n@b|oc @end|ng |rom gm@||@com
Tue Jan 29 12:23:28 CET 2019


Printing from multiple workers to stdout is problematic anyway, because the output from different workers is not intrinsically synchronized -- they will appear interleaved with each other.

Generally for forked processes connections (to stdout, files, databases, urls, ...) need to be opened on the worker rather than inherited from the master; this is obviously not possible with stdout, since there is only one stdout.

One might avoid the problem below by opening separate output streams on individual threads, rather than inheriting from master; one would print to files and collate on the master, but this is more 'logging' than interactive progress report.

Since parallel evaluation is presumably being used because a lot of data are being processed, detailed information on progress would seem to rapidly overwhelm the user. BiocParallel has a progress bar which, when used with tasks, can provided fine-grained updates

> n = 10; p = MulticoreParam(2, tasks = n, progressbar=TRUE)
> res = bplapply(runif(n, 1, 2), Sys.sleep, BPPARAM=p)
  |======================================================================| 100%

Maybe this is enough, perhaps used in conjunction with log = TRUE ?

Another alternative is to use separate processes, each with their own stdout, rather than shared processes -- SnowParam()  and the snow package, rather than MulticoreParam and other (unix-based) fork implementations. Using separate processes requires more discipline but actually is not a bad choice; for instance it is the only approach available on Windows, where 1/2 our users are.

Martin

On 1/28/19, 10:31 PM, "Bioc-devel on behalf of Yang Liao" <bioc-devel-bounces using r-project.org on behalf of liao using wehi.edu.au> wrote:

    Hi,
    
    I'm not sure if some C developers have gone through this problem: it seems that Rprintf cannot work safely in a multi-threaded environment. In particular, if I call Rprintf() from a then-created thread while the stack size checking is enabled (ie the "R_CStackLimit" pointer isn't set to -1), it is very likely to end up with some fatal error messages like:
    
    Error: C stack usage  847645293284 is too close to the limit
    > Error: C stack usage  847336061668 is too close to the limit
    > Error: C stack usage  847666277092 is too close to the limit
    > Error: C stack usage  847346551524 is too close to the limit
    > Error: C stack usage  847367531236 is too close to the limit
    > Error: C stack usage  847357041380 is too close to the limit
    > Error: C stack usage  847378021092 is too close to the limit
    > Error: C stack usage  847655787236 is too close to the limit
    
    , and the R session terminates in a segfault.
    After I used all means to confirm that there was no memory leakage and the real stack use was minimum, I thought it can only be the Rprintf issue. I then disabled all screen outputs from the then-created threads and the error was gone. It was also reported on stackoverflow:
    https://stackoverflow.com/questions/50092949/why-does-rcout-and-rprintf-cause-stack-limit-error-when-multithreading
    I tried using a semaphore to protect all Rprintf calls but it didn't prevent the error.
    
    Since my program needs to report some messages from the worker threads (created by the main thread), I wonder if there is a solution to safely do so, or I have to pipe the messages to the main thread, which in turn calls Rprintf? I hope not to change "R_CStackLimit" to disable the stack size checks because it generates a "NOTE" in R check.
    
    Cheers,
    Yang
    
    _______________________________________________
    
    The information in this email is confidential and intend...{{dropped:15}}
    
    _______________________________________________
    Bioc-devel using r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    


More information about the Bioc-devel mailing list