[Bioc-devel] BiocParallel-devel error

Thomas Girke thomas.girke at ucr.edu
Thu Nov 20 04:48:59 CET 2014


Hi Valerie, Michel and others, 

Finally, I freed up some time to revisit this problem. As it turns out,
it is related to the use of a module system on our cluster. If I add in
the template file for Torque (torque.tmpl) an explicit module load line 
for the specific R version, I am using on the master/head node, like this

module load R/3.1.2-dev

then everything runs as expected without errors in any of the more
recent R release and development versions. Without this line the R
version on the compute nodes will be the one used by default, which may
result in an R version collision when submitting jobs from a different R
version (e.g. R-dev). The reason that things worked in my specific case
with BatchJobs but not with BiocParallel may simply be related to a less
stringent enforcement of R version matches. Sorry that I didn't try this
simple solution earlier. 

I guess what would help to isolate these kinds of problems in the future
is a log file containing the STDOUTs of the processes submitted to the
nodes. BatchJobs captures this information in a jobs subdirectory which
is useful and pointed me to the source of the above error. Not sure whether
this is available through BiocParallel?

Again sorry for the unnecessary noise.

Thomas


On Tue, Sep 23, 2014 at 06:59:11PM -0700, Thomas Girke wrote:
> Hi Valerie,
> 
> Thanks for looking into this. 
> 
> Yes, if I include the bogus 'MYR' in *.tmpl then I am getting the same
> error in R-release as well.
> 
> To double-check whether it is related to some nodes on our cluster (ours
> has different node architectures and the IB interconnect can be flaky at
> times), I restricted the computation to two specific nodes for all
> comparisons using nodes="1:ppn=1+n02+n03". As you can see below, the same 
> computation works in R-release with both BiocParallel and BatchJobs. However, 
> if I run it in R-devel it only works with BatchJobs. 
> 
> Certainly, there could still be another problem with our specfic
> environment on the cluster, not sure?
> 
> For my specific application there is no rush to get things working in 
> BiocParallel right away. BatchJobs works fine for now. 
> 
> Thomas
> 
> ###############
> ## R-release ##
> ###############
> library(BiocParallel); library(BatchJobs)
> f <- function(i) system("hostname", intern=TRUE)
> funs <- makeClusterFunctionsTorque("~/tmp/torque.tmpl")
> param <- BatchJobsParam(4, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"), cluster.functions=funs)
> register(param)
> xx <- bplapply(1:4, f)
> xx
> > xx
> [[1]]
> [1] "n03"
> 
> [[2]]
> [1] "n03"
> 
> [[3]]
> [1] "n03"
> 
> [[4]]
> [1] "n02"
> 
> library(BatchJobs)
> loadConfig(conffile = "~/tmp/.BatchJobs.R")
> reg <- makeRegistry(id="BatchJobTest", work.dir="results") 
> ids <- batchMap(reg, fun=f, 1:4)
> done <- submitJobs(reg, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"))
> sapply(1:4, function(x) loadResult(reg, x))
> [1] "n03" "n03" "n03" "n02"
> 
> > sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  utils     datasets  grDevices methods   base
> 
> other attached packages:
> [1] BatchJobs_1.2      BBmisc_1.7         BiocParallel_0.6.1
> 
> loaded via a namespace (and not attached):
>  [1] BiocGenerics_0.10.0 DBI_0.2-7           RSQLite_0.11.4      Rcpp_0.11.2         brew_1.0-6          checkmate_1.0       codetools_0.2-8     digest_0.6.4        fail_1.2            foreach_1.4.2
> [11] iterators_1.0.7     parallel_3.1.0      plyr_1.8.1          sendmailR_1.1-2     stringr_0.6.2       tools_3.1.0
> 
> #############
> ## R-devel ##
> #############
> 
> library(BiocParallel); library(BatchJobs)
> f <- function(i) system("hostname", intern=TRUE)
> funs <- makeClusterFunctionsTorque("~/tmp/torque.tmpl")
> param <- BatchJobsParam(4, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"), cluster.functions=funs)
> register(param)
> xx <- bplapply(1:4, f)
> 
> Error: 10 errors; first error:
> For more information, use bplasterror(). To resume calculation, re-call the
> function and set the argument 'BPRESUME' to TRUE or wrap the previous call in
> bpresume().
> 
> bplasterror()
> Error in vapply(head(which(is.error), n.print), f, character(1L)) : 
> values must be length 1, but FUN(X[[1]]) result is length 0
> 
> library(BatchJobs)
> loadConfig(conffile = "~/tmp/.BatchJobs.R")
> reg <- makeRegistry(id="BatchJobTest", work.dir="results") 
> ids <- batchMap(reg, fun=f, 1:4)
> done <- submitJobs(reg, resources=list(walltime="00:05:00", nodes="1:ppn=1+n02+n03", memory="1gb"))
> sapply(1:4, function(x) loadResult(reg, x))
> [1] "n03" "n03" "n03" "n02"
> 
> > sessionInfo()
> R Under development (unstable) (2014-05-05 r65530)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  utils     datasets  grDevices methods   base
> 
> other attached packages:
> [1] BatchJobs_1.3        BBmisc_1.7           BiocParallel_0.99.19
> 
> loaded via a namespace (and not attached):
>  [1] BiocGenerics_0.11.4 DBI_0.3.0           RSQLite_0.11.4      brew_1.0-6          checkmate_1.4       codetools_0.2-9     digest_0.6.4        fail_1.2
>       foreach_1.4.2       iterators_1.0.7
> [11] parallel_3.2.0      sendmailR_1.1-2     stringr_0.6.2       tools_3.2.0
> 
> 
> On Tue, Sep 23, 2014 at 09:41:44PM +0000, Valerie Obenchain wrote:
> > Hi,
> > 
> > Martin and I looked into this a bit. It looks like a problem with 
> > handling an 'undefined error' returned from a worker (i.e., job did not 
> > run). When there is a problem executing the tmpl script no error message 
> > is sent back. The NULL is coerced to simpleError and becomes a problem 
> > downstream when the error processing is expecting messages of length > 0.
> > 
> > You can reproduce the error by putting a typo in the script. For example 
> > replace R with something bogus such as MYR in this line:
> > 
> > MYR CMD --no-save --no-restore "<%= rscript %>" /dev/stdout
> > 
> > You said the script worked with release but not devel. Is it possible 
> > there's a problem with how R devel is being called on the cluster?
> > 
> > Michel Lang (cc'd) implemented BatchJobs in BiocParallel. I'd like to 
> > get his opinion on how he wants to handle this type of error.
> > Michel, let me know if you need more details, I can send another example 
> > off-line.
> > 
> > Valerie
> > 
> > 
> > 
> > On 09/22/2014 02:58 PM, Valerie Obenchain wrote:
> > > Hi Thomas,
> > >
> > > Just wanted to let you know I saw this and am looking into it.
> > >
> > > Valerie
> > >
> > > On 09/20/2014 02:54 PM, Thomas Girke wrote:
> > >> Hi Martin, Micheal and Vincent,
> > >>
> > >> If I run the following code, with the release version of BiocParallel
> > >> then it
> > >> works (took me some time to actually realize that), but with the
> > >> development
> > >> version I am getting an error shown after the test code below. If I
> > >> run the
> > >> same test with BatchJobs from the devel branch alone then there is no
> > >> problem.
> > >> Thus, it seems there is some change in the devel version of BiocParallel
> > >> causing this error? The torque.tmpl file I am using on our cluster is the
> > >> standard one from BatchJobs here:
> > >> https://github.com/tudo-r/BatchJobs/blob/master/examples/cfTorque/simple.tmpl
> > >>
> > >>
> > >> For my application, I could stick with BatchJobs, but it would be
> > >> nicer if I
> > >> could get things to work with BiocParallel.
> > >>
> > >> Thanks,
> > >>
> > >> Thomas
> > >>
> > >> ###############
> > >> ## Test Code ##
> > >> ###############
> > >> FUN <- function(i) system("hostname", intern=TRUE)
> > >> library(BiocParallel); library(BatchJobs)
> > >> funs <- makeClusterFunctionsTorque("torque.tmpl")
> > >> param <- BatchJobsParam(4, resources=list(walltime="48:00:00",
> > >> nodes="1:ppn=4", memory="4gb"), cluster.functions=funs)
> > >> register(param)
> > >> xx <- bplapply(1:4, FUN)
> > >>
> > >> Error: 4 errors; first error:
> > >>
> > >> For more information, use bplasterror(). To resume calculation,
> > >> re-call the function and
> > >> set the argument 'BPRESUME' to TRUE or wrap the previous call in
> > >> bpresume()
> > >>
> > >>> bplasterror()
> > >> Error in vapply(head(which(is.error), n.print), f, character(1L)) :
> > >>    values must be length 1,
> > >>   but FUN(X[[1]]) result is length 0
> > >>
> > >>> sessionInfo()
> > >> R Under development (unstable) (2014-05-05 r65530)
> > >> Platform: x86_64-unknown-linux-gnu (64-bit)
> > >>
> > >> locale:
> > >> [1] C
> > >>
> > >> attached base packages:
> > >> [1] stats     graphics  utils     datasets  grDevices methods   base
> > >>
> > >> other attached packages:
> > >> [1] BatchJobs_1.3        BBmisc_1.7           BiocParallel_0.99.19
> > >>
> > >> loaded via a namespace (and not attached):
> > >>   [1] BiocGenerics_0.11.4 DBI_0.3.0           RSQLite_0.11.4
> > >> brew_1.0-6          checkmate_1.4       codetools_0.2-9
> > >> digest_0.6.4        fail_1.2            foreach_1.4.2
> > >> iterators_1.0.7
> > >> [11] parallel_3.2.0      sendmailR_1.1-2     stringr_0.6.2
> > >> tools_3.2.0
> > >>
> > >> _______________________________________________
> > >> Bioc-devel at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >



More information about the Bioc-devel mailing list