[Bioc-devel] BiocParallel-devel error

Thomas Girke thomas.girke at ucr.edu
Wed Sep 24 03:59:11 CEST 2014


Hi Valerie,

Thanks for looking into this. 

Yes, if I include the bogus 'MYR' in *.tmpl then I am getting the same
error in R-release as well.

To double-check whether it is related to some nodes on our cluster (ours
has different node architectures and the IB interconnect can be flaky at
times), I restricted the computation to two specific nodes for all
comparisons using nodes="1:ppn=1+n02+n03". As you can see below, the same 
computation works in R-release with both BiocParallel and BatchJobs. However, 
if I run it in R-devel it only works with BatchJobs. 

Certainly, there could still be another problem with our specfic
environment on the cluster, not sure?

For my specific application there is no rush to get things working in 
BiocParallel right away. BatchJobs works fine for now. 

Thomas

###############
## R-release ##
###############
library(BiocParallel); library(BatchJobs)
f <- function(i) system("hostname", intern=TRUE)
funs <- makeClusterFunctionsTorque("~/tmp/torque.tmpl")
param <- BatchJobsParam(4, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"), cluster.functions=funs)
register(param)
xx <- bplapply(1:4, f)
xx
> xx
[[1]]
[1] "n03"

[[2]]
[1] "n03"

[[3]]
[1] "n03"

[[4]]
[1] "n02"

library(BatchJobs)
loadConfig(conffile = "~/tmp/.BatchJobs.R")
reg <- makeRegistry(id="BatchJobTest", work.dir="results") 
ids <- batchMap(reg, fun=f, 1:4)
done <- submitJobs(reg, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"))
sapply(1:4, function(x) loadResult(reg, x))
[1] "n03" "n03" "n03" "n02"

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods   base

other attached packages:
[1] BatchJobs_1.2      BBmisc_1.7         BiocParallel_0.6.1

loaded via a namespace (and not attached):
 [1] BiocGenerics_0.10.0 DBI_0.2-7           RSQLite_0.11.4      Rcpp_0.11.2         brew_1.0-6          checkmate_1.0       codetools_0.2-8     digest_0.6.4        fail_1.2            foreach_1.4.2
[11] iterators_1.0.7     parallel_3.1.0      plyr_1.8.1          sendmailR_1.1-2     stringr_0.6.2       tools_3.1.0

#############
## R-devel ##
#############

library(BiocParallel); library(BatchJobs)
f <- function(i) system("hostname", intern=TRUE)
funs <- makeClusterFunctionsTorque("~/tmp/torque.tmpl")
param <- BatchJobsParam(4, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"), cluster.functions=funs)
register(param)
xx <- bplapply(1:4, f)

Error: 10 errors; first error:
For more information, use bplasterror(). To resume calculation, re-call the
function and set the argument 'BPRESUME' to TRUE or wrap the previous call in
bpresume().

bplasterror()
Error in vapply(head(which(is.error), n.print), f, character(1L)) : 
values must be length 1, but FUN(X[[1]]) result is length 0

library(BatchJobs)
loadConfig(conffile = "~/tmp/.BatchJobs.R")
reg <- makeRegistry(id="BatchJobTest", work.dir="results") 
ids <- batchMap(reg, fun=f, 1:4)
done <- submitJobs(reg, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"))
sapply(1:4, function(x) loadResult(reg, x))
[1] "n03" "n03" "n03" "n02"

> sessionInfo()
R Under development (unstable) (2014-05-05 r65530)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods   base

other attached packages:
[1] BatchJobs_1.3        BBmisc_1.7           BiocParallel_0.99.19

loaded via a namespace (and not attached):
 [1] BiocGenerics_0.11.4 DBI_0.3.0           RSQLite_0.11.4      brew_1.0-6          checkmate_1.4       codetools_0.2-9     digest_0.6.4        fail_1.2
      foreach_1.4.2       iterators_1.0.7
[11] parallel_3.2.0      sendmailR_1.1-2     stringr_0.6.2       tools_3.2.0


On Tue, Sep 23, 2014 at 09:41:44PM +0000, Valerie Obenchain wrote:
> Hi,
> 
> Martin and I looked into this a bit. It looks like a problem with 
> handling an 'undefined error' returned from a worker (i.e., job did not 
> run). When there is a problem executing the tmpl script no error message 
> is sent back. The NULL is coerced to simpleError and becomes a problem 
> downstream when the error processing is expecting messages of length > 0.
> 
> You can reproduce the error by putting a typo in the script. For example 
> replace R with something bogus such as MYR in this line:
> 
> MYR CMD --no-save --no-restore "<%= rscript %>" /dev/stdout
> 
> You said the script worked with release but not devel. Is it possible 
> there's a problem with how R devel is being called on the cluster?
> 
> Michel Lang (cc'd) implemented BatchJobs in BiocParallel. I'd like to 
> get his opinion on how he wants to handle this type of error.
> Michel, let me know if you need more details, I can send another example 
> off-line.
> 
> Valerie
> 
> 
> 
> On 09/22/2014 02:58 PM, Valerie Obenchain wrote:
> > Hi Thomas,
> >
> > Just wanted to let you know I saw this and am looking into it.
> >
> > Valerie
> >
> > On 09/20/2014 02:54 PM, Thomas Girke wrote:
> >> Hi Martin, Micheal and Vincent,
> >>
> >> If I run the following code, with the release version of BiocParallel
> >> then it
> >> works (took me some time to actually realize that), but with the
> >> development
> >> version I am getting an error shown after the test code below. If I
> >> run the
> >> same test with BatchJobs from the devel branch alone then there is no
> >> problem.
> >> Thus, it seems there is some change in the devel version of BiocParallel
> >> causing this error? The torque.tmpl file I am using on our cluster is the
> >> standard one from BatchJobs here:
> >> https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl
> >>
> >>
> >> For my application, I could stick with BatchJobs, but it would be
> >> nicer if I
> >> could get things to work with BiocParallel.
> >>
> >> Thanks,
> >>
> >> Thomas
> >>
> >> ###############
> >> ## Test Code ##
> >> ###############
> >> FUN <- function(i) system("hostname", intern=TRUE)
> >> library(BiocParallel); library(BatchJobs)
> >> funs <- makeClusterFunctionsTorque("torque.tmpl")
> >> param <- BatchJobsParam(4, resources=list(walltime="48:00:00",
> >> nodes="1:ppn=4", memory="4gb"), cluster.functions=funs)
> >> register(param)
> >> xx <- bplapply(1:4, FUN)
> >>
> >> Error: 4 errors; first error:
> >>
> >> For more information, use bplasterror(). To resume calculation,
> >> re-call the function and
> >> set the argument 'BPRESUME' to TRUE or wrap the previous call in
> >> bpresume()
> >>
> >>> bplasterror()
> >> Error in vapply(head(which(is.error), n.print), f, character(1L)) :
> >>    values must be length 1,
> >>   but FUN(X[[1]]) result is length 0
> >>
> >>> sessionInfo()
> >> R Under development (unstable) (2014-05-05 r65530)
> >> Platform: x86_64-unknown-linux-gnu (64-bit)
> >>
> >> locale:
> >> [1] C
> >>
> >> attached base packages:
> >> [1] stats     graphics  utils     datasets  grDevices methods   base
> >>
> >> other attached packages:
> >> [1] BatchJobs_1.3        BBmisc_1.7           BiocParallel_0.99.19
> >>
> >> loaded via a namespace (and not attached):
> >>   [1] BiocGenerics_0.11.4 DBI_0.3.0           RSQLite_0.11.4
> >> brew_1.0-6          checkmate_1.4       codetools_0.2-9
> >> digest_0.6.4        fail_1.2            foreach_1.4.2
> >> iterators_1.0.7
> >> [11] parallel_3.2.0      sendmailR_1.1-2     stringr_0.6.2
> >> tools_3.2.0
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list