[Bioc-devel] Redirect workers output to STDERR, how to do so with current BiocParallel::SnowParam()?

Valerie Obenchain vobencha at fredhutch.org
Tue Jul 14 21:12:25 CEST 2015


Hi,

Thanks for sending the updated information. To make sure I know how the 
files relate, let's take snow-3.2.x, BiocParallel 1.3.34 run on July 14 
as an example.

memory files:

The mem_email files report max virtual memory (Max vmem) on the master, 
for the whole job. I see 3 outputs here - one run from July 9 and two 
from July 14:

https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/mem_emails/snow-3.2.x.txt

This last entry corresponds to BiocParallel 1.3.34 from July 14:

> Job 6463239 (snow-3.2.x) Complete
>  User             = lcollado
>  Queue            = shared.q at compute-082.cm.cluster
>  Host             = compute-082.cm.cluster
>  Start Time       = 07/14/2015 12:20:48
>  End Time         = 07/14/2015 12:22:27
>  User Time        = 00:00:18
>  System Time      = 00:00:04
>  Wallclock Time   = 00:01:39
>  CPU              = 00:10:14
>  Max vmem         = 7.286G
>  Exit Status      = 0


log files:

Corresponding log files for BiocParallel 1.3.34 on July 14 show memory 
used on workers:

https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/snow-3.2.x.o6463213
https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/snow-3.2.x.o6463239



On 07/14/2015 10:23 AM, Leonardo Collado Torres wrote:
> Hi Valerie,
>
> My other recent thread about SnowParam
> (https://stat.ethz.ch/pipermail/bioc-devel/2015-July/007788.html)
> allowed me to see a small difference.
>
> In R 3.1.x, using 'outfile' as in
> https://github.com/lcolladotor/SnowParam-memory/blob/d9d70086016c7e720714bec48f121aeca05b1416/SnowParam-memory-derfinder.R#L83
> showed more info than in R 3.2.0 and 3.2.1 as shown below:
>
> * R 3.1.x https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.1.x.e6416599
> * R 3.2.0 https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.2.e6416601
> * R 3.2.1 https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.2.x.e6416603
>
> using BiocParallel versions
>
> * 1.0.3 https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.1.x.o6416599#L57
> * 1.2.8 https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.2.o6416601#L58
> * 1.3.33 https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.2.x.o6416603#L58
>
> Basically, only some of the message() calls get printed in stderr in R
> 3.2.0 and 3.2.1 which is something I wasn't expecting. However, using
> "log = TRUE", all the message() calls get printed in stdout, as
> expected given the docs, in which case using 'outfile' is kind of
> useless.

The idea was to phase out the snow-centric 'outfile' and replace it with 
'log' and 'logdir'. You're right, previous versions of BiocParallel did 
not capture stdout and stderr on the workers; the current release and 
devel do. As you've now seen, when log = TRUE stderr and stdout messages 
print to the console, if logdir is given they go to a file. One thing I 
have not done yet is capture stdout and stderr from the master to a 
'logdir' file (workers captured only).

>
> See below for the output under R 3.2.0 when log = TRUE
> https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.2.o6463201
> https://github.com/lcolladotor/SnowParam-memory/blob/gh-pages/logs/der-snow-3.2.e6463201
>
> Assuming that it's all working as intended, I'll switch to using log =
> TRUE in some of my code when users specify 'verbose = TRUE'.

Sounds good.

>
> Cheers,
> Leo
>
> PS In page 1 of
> http://www.bioconductor.org/packages/3.2/bioc/vignettes/BiocParallel/inst/doc/Errors_Logs_And_Debugging.pdf
> there's a typo. It reads "SnowParma" instead of "SnowParam".

Thanks, I'll fix that.

Valerie

>
> On Fri, May 22, 2015 at 5:33 PM, Valerie Obenchain
> <vobencha at fredhutch.org> wrote:
>> Hi,
>>
>> Thanks for reporting the bug. Now fixed in 1.13.4 (devel) and 1.2.2
>> (release).
>>
>>> bplapply(1:2, print, BPPARAM=SnowParam(outfile = NULL))
>> starting worker for localhost:11031
>> starting worker for localhost:11031
>> Type: EXEC
>> [1] 1
>> Type: EXEC
>> [1] 2
>> Type: DONE
>> Type: DONE
>> [[1]]
>> [1] 1
>>
>> [[2]]
>> [1] 2
>>
>>
>> Some new features have been added to BiocParallel - logging with
>> futile.logger, writing out log and result files and control over how a job
>> is divided into tasks.
>>
>> ## send log file to 'logdir':
>> bplapply(1:2, print, BPPARAM=SnowParam(log=TRUE, logdir=tempdir())
>>
>> ## write results to 'resdir':
>> bplapply(1:2, print, BPPARAM=SnowParam(resdir=tempdir())
>>
>> ## by default jobs are divided evenly over the workers
>> bplapply(1:100, print, BPPARAM=SnowParam(workers=4))
>>
>> ## force the job to be run in 2 tasks
>> ## (useful when you know runtime or memory requirements)
>> bplapply(1:100, print, BPPARAM=SnowParam(tasks=2))
>>
>>
>> It would be great if you had a chance to try any of these out - I'd be
>> interested in the feedback. Logging was intended to take the idea of the
>> 'outfile' from snow further with the ability to add messages that can be
>> filtered by threshold.
>>
>> The package also has a new errors/logging vignette:
>>
>> http://www.bioconductor.org/packages/3.2/bioc/vignettes/BiocParallel/inst/doc/Errors_Logs_And_Debugging.pdf
>>
>> Valerie
>>
>>
>>
>>
>> On 05/21/2015 07:58 AM, Leonardo Collado Torres wrote:
>>>
>>> Hi,
>>>
>>> This might be a BioC support website question, but maybe it's a bug.
>>>
>>> In previous versions of BiocParallel, you could specify where to
>>> direct the output from the workers by using the 'outfile' argument.
>>> For example, SnowParam(outfile = Sys.getenv('SGE_STDERR_PATH')). I'm
>>> not finding how to do so with the current version (1.3.12).
>>>
>>> I understand that SnowParam() has a ... argument that gets passed to
>>> snow::makeCluster(), according to the SnowParam() docs. I also see
>>> that when log = TRUE, a script is used instead of snow. But log =
>>> FALSE by default.
>>>
>>> If I use snow, the 'outfile' argument does work as shown below:
>>>
>>> ## Nothing gets printed by default
>>>>
>>>> cl <- makeCluster(2, type = 'SOCK')
>>>> y <- clusterApply(cl, 1:2, print)
>>>
>>>
>>>
>>> ## Use outfile now, print works
>>>>
>>>> cl <- makeCluster(2, type = 'SOCK', outfile = NULL)
>>>
>>> starting worker for localhost:11671
>>> starting worker for localhost:11671
>>>>
>>>> y <- clusterApply(cl, 1:2, print)
>>>
>>> Type: EXEC
>>> Type: EXEC
>>> [1] 1
>>> [1] 2
>>>
>>>> packageVersion('snow')
>>>
>>> [1] ‘0.3.13’
>>>
>>>
>>> However, I can't use 'outfile' with SnowParam:
>>>
>>> ## SerialParam works
>>>>
>>>> x <- bplapply(1:2, print, BPPARAM = SerialParam())
>>>
>>> [1] 1
>>> [1] 2
>>>
>>> ## No printing by default with SnowParam, as expected
>>>>
>>>> x <- bplapply(1:2, print, BPPARAM = SnowParam(workers = 1))
>>>
>>>
>>> ## Can't pass 'outfile' argument
>>>>
>>>> x <- bplapply(1:2, print, BPPARAM = SnowParam(workers = 1, outfile =
>>>> NULL))
>>>
>>> Error in bplapply(1:2, print, BPPARAM = SnowParam(workers = 1, outfile
>>> = NULL)) :
>>>     error in evaluating the argument 'BPPARAM' in selecting a method for
>>> function 'bplapply': Error in envRefSetField(.Object, field, classDef,
>>> selfEnv, elements[[field]]) :
>>>     ‘outfile’ is not a field in class “SnowParam”
>>>
>>>
>>> Digging at the code, I see that in SnowParam() the ... argument is
>>> saved in .clusterargs
>>>
>>> args <- c(list(spec = workers, type = type), list(...))
>>>       .clusterargs <- lapply(args, force)
>>>
>>> However, ... is still passed to .SnowParam(), and .SnowParam() fields are:
>>>
>>> fields=list(
>>>           cluster="cluster",
>>>           .clusterargs="list",
>>>           .controlled="logical",
>>>           log="logical",
>>>           threshold="ANY",
>>>           logdir="character",
>>>           resultdir="character")
>>>
>>>
>>> So, I'm wondering if ... should not be passed to .SnowParam().
>>>
>>>
>>>
>>> ## Trying to pass outfile to .clusterargs directly doesn't work
>>> ## Actually, I'm surprised it didn't crash .SnowParam()
>>>>
>>>> x <- bplapply(1:2, print, BPPARAM = SnowParam(workers = 1,
>>>> .clusterargs=list(outfile = NULL)))
>>>
>>>
>>>
>>> Full log: https://gist.github.com/9ac957c6cad1c07f4ea4
>>>
>>>
>>> Thanks,
>>> Leo
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, Seattle, WA 98109
>>
>> Email: vobencha at fredhutch.org
>> Phone: (206) 667-3158


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fredhutch.org
Phone: (206) 667-3158



More information about the Bioc-devel mailing list