[Bioc-devel] readGAlignmentPairs Fails if Used Inside mclapply Loop
Martin Morgan
martin.morgan at roswellpark.org
Mon Dec 12 13:00:58 CET 2016
On 12/12/2016 06:00 AM, Dario Strbenac wrote:
> Good day,
>
> I found that readGAlignmentPairs fails when used inside an mclapply loop but not an sapply loop. I haven't had such problems with other functions when using mclapply.
>
>> class(mappedToGenomeFiles)
> [1] "character"
>> length(mappedToGenomeFiles)
> [1] 13
>
>> mappedReadsGenome <- sapply(mappedToGenomeFiles, function(bamFile)
> {
> readGAlignmentPairs(bamFile, strandMode = 2)
> })
> # No error. Each item is of GAlignmentPairs class.
>
> But, with mclapply:
>
>> mappedReadsGenome <- mclapply(mappedToGenomeFiles, function(bamFile)
> {
> readGAlignmentPairs(bamFile, strandMode = 2)
> }, mc.cores = 7)
> Warning message:
> In mclapply(mappedToGenomeFiles, function(bamFile) { :
> scheduled cores 6, 5, 3, 1, 4, 2 encountered errors in user code, all values of the jobs will be affected
>> mappedReadsGenome
> [[1]]
> [1] "fatal error in wrapper code"
> attr(,"class")
> [1] "try-error"
> [[2]]
> [1] "fatal error in wrapper code"
> attr(,"class")
> [1] "try-error"
if the return value is large, and R tries to serialize them, then it may
be that the size of the serialized vector is too large to be represented
in R -- you could try
length(serialize(readGAlignementPairs(bamFile, strandMode=2))))
to test whether this causes the error.
With parallel evaluation you generally want to minimize the amount of
data communicated (in both directions) between manager and worker. And
since workers are contending for memory on the same machine, you
generally want to adopt strategies like
bf = BamFile(yieldSize=1000000)
GenomicFiles::reduceByYield(bf, ...)
that iterate through the large object in moderate-sized chunks.
Martin
> .
> .
> .
> [[7]]
> GAlignmentPairs object with 41860576 pairs, strandMode=2, and 0 metadata columns:
> seqnames strand : ranges -- ranges
> <Rle> <Rle> : <IRanges> -- <IRanges>
> [1] chr14 + : [19010525, 19010623] -- [19010414, 19010513]
> [2] chr14 + : [19010543, 19010612] -- [19010505, 19010604]
> [3] chr14 + : [19010608, 19010707] -- [19010577, 19010676]
> [4] chr14 + : [19011187, 19011286] -- [19011142, 19011241]
> [5] chr14 + : [19011318, 19011415] -- [19011187, 19011286]
> ... ... ... ... ... ... ...
> [41860572] chr4 + : [190972787, 190972886] -- [190972685, 190972784]
> [41860573] chr4 - : [190974302, 190974385] -- [190974302, 190974385]
> [41860574] chr4 - : [190978480, 190978579] -- [190978542, 190978641]
> [41860575] chr4 - : [190982116, 190982215] -- [190982125, 190982224]
> [41860576] chr4 + : [191031678, 191031776] -- [191031630, 191031729]
> -------
> seqinfo: 25 sequences from an unspecified genome
> .
> .
> .
> [[13]]
> [1] "fatal error in wrapper code"
> attr(,"class")
> [1] "try-error"
>
> Interestingly, reading in from one of the thirteen file paths worked.
>
> In contrast, a simple test case of the same length works:
>
> X=1:13
> mclapply(X, function(x) x + 1, mc.cores = 7) # Prints 2:14.
>
> The BAM file import also works with blapply and BPPARAM = MulticoreParam(workers = 7)
>
>> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 8 (jessie)
>
> locale:
> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicAlignments_1.10.0 SummarizedExperiment_1.4.0 GenomicFeatures_1.26.0 AnnotationDbi_1.36.0 Biobase_2.34.0
> [6] Rsamtools_1.26.1 Biostrings_2.42.0 XVector_0.14.0 GenomicRanges_1.26.1 GenomeInfoDb_1.10.1
> [11] IRanges_2.8.1 S4Vectors_0.12.0 BiocGenerics_0.20.0
>
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.20.0 BiocParallel_1.8.1 lattice_0.20-34 tools_3.3.2 grid_3.3.2 DBI_0.5-1 Matrix_1.2-7.1
> [8] rtracklayer_1.34.1 bitops_1.0-6 RCurl_1.95-4.8 biomaRt_2.30.0 RSQLite_1.0.0 XML_3.98-1.5
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
This email message may contain legally privileged and/or...{{dropped:2}}
More information about the Bioc-devel
mailing list