[Bioc-devel] readGAlignmentPairs function broken?

Hervé Pagès hpages at fhcrc.org
Thu Dec 19 19:55:42 CET 2013


Hi Leonard,

Thanks for the details. This is under investigation. I'll let
you know.

Cheers,
H.

On 12/19/2013 10:41 AM, Leonard Goldstein wrote:
> Hi Hervé
>
> You probably spotted this already but it looks like the problem is
> introduced between GenomicAlignments revisions r84052 (0.99.8) and
> r84106 (0.99.9)
>
> Best wishes
>
> Leonard
>
>
> On Wed, Dec 18, 2013 at 5:25 PM, Leonard Goldstein <goldstel at gene.com> wrote:
>> Dear list,
>>
>> There seems to be a problem with the readGAlignmentPairs function:
>>
>> Querying genomic regions without any alignments using the which
>> argument results in an error -- see (1) below. Reading in a whole
>> chromosome takes indefinitely (or at least much longer than in
>> previous versions) -- see (2) below. I suspect these issues are not
>> specific to the BAM files I am working with but can provide test data
>> if required.
>>
>> Many thanks for your help.
>>
>> Leonard
>>
>>
>> --
>> (1) Attempts to read an empty region results in an error.
>>
>>> gr <- GRanges("22", IRanges(1000000, 2000000))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> readGAlignmentPairs(file, param = param)
>> Error in `elementMetadata<-`(x, ..., value = value) :
>>    replacement 'elementMetadata' value must be a DataTable object or NULL
>>>
>>> sessionInfo()
>> R Under development (unstable) (2013-12-03 r64376)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
>> [4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
>> [7] BiocGenerics_0.9.2       BiocInstaller_1.13.3
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.1.0   tools_3.1.0    zlibbioc_1.9.0
>>>
>>
>> ... but works fine with previous version
>>
>>> gr <- GRanges("22", IRanges(1000000, 2000000))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> readGAlignmentPairs(file, param = param)
>> GAlignmentPairs with 0 alignment pairs and 0 metadata columns:
>>   seqnames strand :    ranges --    ranges
>>      <Rle>  <Rle> : <IRanges> -- <IRanges>
>> ---
>> seqlengths:
>>            1          2          3 ... GL000247.1 GL000248.1 GL000249.1
>>    249250621  243199373  198022430 ...      36422      39786      38502
>>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
>> [4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>>
>>
>> (2) Use of the which argument covering chromosome 22 takes under one
>> minute with an earlier version of readGAlignmentPairs
>>
>>> gr <- GRanges("22", IRanges(1, 51304566))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> system.time(gap <- readGAlignmentPairs(file, param = param))
>>     user  system elapsed
>>   45.887   0.256  46.168
>>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
>> [4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>>>
>>
>> ... but at least twenty times as long with the current version.
>>
>>> gr <- GRanges("22", IRanges(1, 51304566))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> system.time(gap <- readGAlignmentPairs(file, param = param))
>>
>> ^C
>> Timing stopped at: 1108.041 35.998 1144.006
>>>
>>> sessionInfo()
>> R Under development (unstable) (2013-12-03 r64376)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
>> [4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
>> [7] BiocGenerics_0.9.2
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list