[Bioc-devel] readGAlignmentPairs Fails if Used Inside mclapply Loop

Martin Morgan martin.morgan at roswellpark.org
Mon Dec 12 13:00:58 CET 2016


On 12/12/2016 06:00 AM, Dario Strbenac wrote:
> Good day,
>
> I found that readGAlignmentPairs fails when used inside an mclapply loop but not an sapply loop. I haven't had such problems with other functions when using mclapply.
>
>> class(mappedToGenomeFiles)
> [1] "character"
>> length(mappedToGenomeFiles)
> [1] 13
>
>> mappedReadsGenome <- sapply(mappedToGenomeFiles, function(bamFile)
>   {
>       readGAlignmentPairs(bamFile, strandMode = 2)
>   })
> # No error. Each item is of GAlignmentPairs class.
>
> But, with mclapply:
>
>> mappedReadsGenome <- mclapply(mappedToGenomeFiles, function(bamFile)
>   {
>       readGAlignmentPairs(bamFile, strandMode = 2)
>   }, mc.cores = 7)
> Warning message:
> In mclapply(mappedToGenomeFiles, function(bamFile) { :
>   scheduled cores 6, 5, 3, 1, 4, 2 encountered errors in user code, all values of the jobs will be affected
>> mappedReadsGenome
> [[1]]
> [1] "fatal error in wrapper code"
> attr(,"class")
> [1] "try-error"
> [[2]]
> [1] "fatal error in wrapper code"
> attr(,"class")
> [1] "try-error"

if the return value is large, and R tries to serialize them, then it may 
be that the size of the serialized vector is too large to be represented 
in R -- you could try

   length(serialize(readGAlignementPairs(bamFile, strandMode=2))))

to test whether this causes the error.

With parallel evaluation you generally want to minimize the amount of 
data communicated (in both directions) between manager and worker. And 
since workers are contending for memory on the same machine, you 
generally want to adopt strategies like

     bf = BamFile(yieldSize=1000000)
     GenomicFiles::reduceByYield(bf, ...)

that iterate through the large object in moderate-sized chunks.

Martin

>            .
>            .
>            .
> [[7]]
> GAlignmentPairs object with 41860576 pairs, strandMode=2, and 0 metadata columns:
>              seqnames strand   :                 ranges  --                 ranges
>                 <Rle>  <Rle>   :              <IRanges>  --              <IRanges>
>          [1]    chr14      +   :   [19010525, 19010623]  --   [19010414, 19010513]
>          [2]    chr14      +   :   [19010543, 19010612]  --   [19010505, 19010604]
>          [3]    chr14      +   :   [19010608, 19010707]  --   [19010577, 19010676]
>          [4]    chr14      +   :   [19011187, 19011286]  --   [19011142, 19011241]
>          [5]    chr14      +   :   [19011318, 19011415]  --   [19011187, 19011286]
>          ...      ...    ... ...                    ... ...                    ...
>   [41860572]     chr4      +   : [190972787, 190972886]  -- [190972685, 190972784]
>   [41860573]     chr4      -   : [190974302, 190974385]  -- [190974302, 190974385]
>   [41860574]     chr4      -   : [190978480, 190978579]  -- [190978542, 190978641]
>   [41860575]     chr4      -   : [190982116, 190982215]  -- [190982125, 190982224]
>   [41860576]     chr4      +   : [191031678, 191031776]  -- [191031630, 191031729]
>   -------
>   seqinfo: 25 sequences from an unspecified genome
>            .
>            .
>            .
> [[13]]
> [1] "fatal error in wrapper code"
> attr(,"class")
> [1] "try-error"
>
> Interestingly, reading in from one of the thirteen file paths worked.
>
> In contrast, a simple test case of the same length works:
>
> X=1:13
> mclapply(X, function(x) x + 1, mc.cores = 7) # Prints 2:14.
>
> The BAM file import also works with blapply and BPPARAM = MulticoreParam(workers = 7)
>
>> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 8 (jessie)
>
> locale:
>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] GenomicAlignments_1.10.0   SummarizedExperiment_1.4.0 GenomicFeatures_1.26.0     AnnotationDbi_1.36.0       Biobase_2.34.0
>  [6] Rsamtools_1.26.1           Biostrings_2.42.0          XVector_0.14.0             GenomicRanges_1.26.1       GenomeInfoDb_1.10.1
> [11] IRanges_2.8.1              S4Vectors_0.12.0           BiocGenerics_0.20.0
>
> loaded via a namespace (and not attached):
>  [1] zlibbioc_1.20.0    BiocParallel_1.8.1 lattice_0.20-34    tools_3.3.2        grid_3.3.2         DBI_0.5-1          Matrix_1.2-7.1
>  [8] rtracklayer_1.34.1 bitops_1.0-6       RCurl_1.95-4.8     biomaRt_2.30.0     RSQLite_1.0.0      XML_3.98-1.5
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list