[Bioc-devel] BiocParallel-devel error

Michel Lang michellang at gmail.com
Wed Sep 24 12:05:43 CEST 2014


2014-09-23 23:41 GMT+02:00 Valerie Obenchain <vobencha at fhcrc.org>:
> Michel Lang (cc'd) implemented BatchJobs in BiocParallel. I'd like to get
> his opinion on how he wants to handle this type of error.
> Michel, let me know if you need more details, I can send another example
> off-line.

If the cluster is misconfigured there is simply no way to detect these
kinds of errors. But in principle this is very similar to expired jobs
(where the scheduler killed the process before the database
transaction is started/completed).

I'll try to provide a little patch for expired jobs until next week
and push it to BiocParallel. Still unsure about the differences
between R-release and R-devel.

Best and thanks for reporting,
Michel

2014-09-24 3:59 GMT+02:00 Thomas Girke <thomas.girke at ucr.edu>:
> Hi Valerie,
>
> Thanks for looking into this.
>
> Yes, if I include the bogus 'MYR' in *.tmpl then I am getting the same
> error in R-release as well.
>
> To double-check whether it is related to some nodes on our cluster (ours
> has different node architectures and the IB interconnect can be flaky at
> times), I restricted the computation to two specific nodes for all
> comparisons using nodes="1:ppn=1+n02+n03". As you can see below, the same
> computation works in R-release with both BiocParallel and BatchJobs. However,
> if I run it in R-devel it only works with BatchJobs.
>
> Certainly, there could still be another problem with our specfic
> environment on the cluster, not sure?
>
> For my specific application there is no rush to get things working in
> BiocParallel right away. BatchJobs works fine for now.
>
> Thomas
>
> ###############
> ## R-release ##
> ###############
> library(BiocParallel); library(BatchJobs)
> f <- function(i) system("hostname", intern=TRUE)
> funs <- makeClusterFunctionsTorque("~/tmp/torque.tmpl")
> param <- BatchJobsParam(4, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"), cluster.functions=funs)
> register(param)
> xx <- bplapply(1:4, f)
> xx
>> xx
> [[1]]
> [1] "n03"
>
> [[2]]
> [1] "n03"
>
> [[3]]
> [1] "n03"
>
> [[4]]
> [1] "n02"
>
> library(BatchJobs)
> loadConfig(conffile = "~/tmp/.BatchJobs.R")
> reg <- makeRegistry(id="BatchJobTest", work.dir="results")
> ids <- batchMap(reg, fun=f, 1:4)
> done <- submitJobs(reg, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"))
> sapply(1:4, function(x) loadResult(reg, x))
> [1] "n03" "n03" "n03" "n02"
>
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  utils     datasets  grDevices methods   base
>
> other attached packages:
> [1] BatchJobs_1.2      BBmisc_1.7         BiocParallel_0.6.1
>
> loaded via a namespace (and not attached):
>  [1] BiocGenerics_0.10.0 DBI_0.2-7           RSQLite_0.11.4      Rcpp_0.11.2         brew_1.0-6          checkmate_1.0       codetools_0.2-8     digest_0.6.4        fail_1.2            foreach_1.4.2
> [11] iterators_1.0.7     parallel_3.1.0      plyr_1.8.1          sendmailR_1.1-2     stringr_0.6.2       tools_3.1.0
>
> #############
> ## R-devel ##
> #############
>
> library(BiocParallel); library(BatchJobs)
> f <- function(i) system("hostname", intern=TRUE)
> funs <- makeClusterFunctionsTorque("~/tmp/torque.tmpl")
> param <- BatchJobsParam(4, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"), cluster.functions=funs)
> register(param)
> xx <- bplapply(1:4, f)
>
> Error: 10 errors; first error:
> For more information, use bplasterror(). To resume calculation, re-call the
> function and set the argument 'BPRESUME' to TRUE or wrap the previous call in
> bpresume().
>
> bplasterror()
> Error in vapply(head(which(is.error), n.print), f, character(1L)) :
> values must be length 1, but FUN(X[[1]]) result is length 0
>
> library(BatchJobs)
> loadConfig(conffile = "~/tmp/.BatchJobs.R")
> reg <- makeRegistry(id="BatchJobTest", work.dir="results")
> ids <- batchMap(reg, fun=f, 1:4)
> done <- submitJobs(reg, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"))
> sapply(1:4, function(x) loadResult(reg, x))
> [1] "n03" "n03" "n03" "n02"
>
>> sessionInfo()
> R Under development (unstable) (2014-05-05 r65530)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  utils     datasets  grDevices methods   base
>
> other attached packages:
> [1] BatchJobs_1.3        BBmisc_1.7           BiocParallel_0.99.19
>
> loaded via a namespace (and not attached):
>  [1] BiocGenerics_0.11.4 DBI_0.3.0           RSQLite_0.11.4      brew_1.0-6          checkmate_1.4       codetools_0.2-9     digest_0.6.4        fail_1.2
>       foreach_1.4.2       iterators_1.0.7
> [11] parallel_3.2.0      sendmailR_1.1-2     stringr_0.6.2       tools_3.2.0
>
>
> On Tue, Sep 23, 2014 at 09:41:44PM +0000, Valerie Obenchain wrote:
>> Hi,
>>
>> Martin and I looked into this a bit. It looks like a problem with
>> handling an 'undefined error' returned from a worker (i.e., job did not
>> run). When there is a problem executing the tmpl script no error message
>> is sent back. The NULL is coerced to simpleError and becomes a problem
>> downstream when the error processing is expecting messages of length > 0.
>>
>> You can reproduce the error by putting a typo in the script. For example
>> replace R with something bogus such as MYR in this line:
>>
>> MYR CMD --no-save --no-restore "<%= rscript %>" /dev/stdout
>>
>> You said the script worked with release but not devel. Is it possible
>> there's a problem with how R devel is being called on the cluster?
>>
>> Michel Lang (cc'd) implemented BatchJobs in BiocParallel. I'd like to
>> get his opinion on how he wants to handle this type of error.
>> Michel, let me know if you need more details, I can send another example
>> off-line.
>>
>> Valerie
>>
>>
>>
>> On 09/22/2014 02:58 PM, Valerie Obenchain wrote:
>> > Hi Thomas,
>> >
>> > Just wanted to let you know I saw this and am looking into it.
>> >
>> > Valerie
>> >
>> > On 09/20/2014 02:54 PM, Thomas Girke wrote:
>> >> Hi Martin, Micheal and Vincent,
>> >>
>> >> If I run the following code, with the release version of BiocParallel
>> >> then it
>> >> works (took me some time to actually realize that), but with the
>> >> development
>> >> version I am getting an error shown after the test code below. If I
>> >> run the
>> >> same test with BatchJobs from the devel branch alone then there is no
>> >> problem.
>> >> Thus, it seems there is some change in the devel version of BiocParallel
>> >> causing this error? The torque.tmpl file I am using on our cluster is the
>> >> standard one from BatchJobs here:
>> >> https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl
>> >>
>> >>
>> >> For my application, I could stick with BatchJobs, but it would be
>> >> nicer if I
>> >> could get things to work with BiocParallel.
>> >>
>> >> Thanks,
>> >>
>> >> Thomas
>> >>
>> >> ###############
>> >> ## Test Code ##
>> >> ###############
>> >> FUN <- function(i) system("hostname", intern=TRUE)
>> >> library(BiocParallel); library(BatchJobs)
>> >> funs <- makeClusterFunctionsTorque("torque.tmpl")
>> >> param <- BatchJobsParam(4, resources=list(walltime="48:00:00",
>> >> nodes="1:ppn=4", memory="4gb"), cluster.functions=funs)
>> >> register(param)
>> >> xx <- bplapply(1:4, FUN)
>> >>
>> >> Error: 4 errors; first error:
>> >>
>> >> For more information, use bplasterror(). To resume calculation,
>> >> re-call the function and
>> >> set the argument 'BPRESUME' to TRUE or wrap the previous call in
>> >> bpresume()
>> >>
>> >>> bplasterror()
>> >> Error in vapply(head(which(is.error), n.print), f, character(1L)) :
>> >>    values must be length 1,
>> >>   but FUN(X[[1]]) result is length 0
>> >>
>> >>> sessionInfo()
>> >> R Under development (unstable) (2014-05-05 r65530)
>> >> Platform: x86_64-unknown-linux-gnu (64-bit)
>> >>
>> >> locale:
>> >> [1] C
>> >>
>> >> attached base packages:
>> >> [1] stats     graphics  utils     datasets  grDevices methods   base
>> >>
>> >> other attached packages:
>> >> [1] BatchJobs_1.3        BBmisc_1.7           BiocParallel_0.99.19
>> >>
>> >> loaded via a namespace (and not attached):
>> >>   [1] BiocGenerics_0.11.4 DBI_0.3.0           RSQLite_0.11.4
>> >> brew_1.0-6          checkmate_1.4       codetools_0.2-9
>> >> digest_0.6.4        fail_1.2            foreach_1.4.2
>> >> iterators_1.0.7
>> >> [11] parallel_3.2.0      sendmailR_1.1-2     stringr_0.6.2
>> >> tools_3.2.0
>> >>
>> >> _______________________________________________
>> >> Bioc-devel at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >
>> > _______________________________________________
>> > Bioc-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>



More information about the Bioc-devel mailing list