[Bioc-devel] readGAlignmentPairs function broken?

Hervé Pagès hpages at fhcrc.org
Thu Dec 19 20:52:28 CET 2013


Hi Leonard,

This should be fixed in GenomicAlignments 0.99.10, which will become
available thru biocLite() in the next 24 hours or so. In the mean time
you can grab it directly from svn:

 
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicAlignments

Please let me know if you still run into problems with this.
Thanks!

H.

On 12/19/2013 10:41 AM, Leonard Goldstein wrote:
> Hi Hervé
>
> You probably spotted this already but it looks like the problem is
> introduced between GenomicAlignments revisions r84052 (0.99.8) and
> r84106 (0.99.9)
>
> Best wishes
>
> Leonard
>
>
> On Wed, Dec 18, 2013 at 5:25 PM, Leonard Goldstein <goldstel at gene.com> wrote:
>> Dear list,
>>
>> There seems to be a problem with the readGAlignmentPairs function:
>>
>> Querying genomic regions without any alignments using the which
>> argument results in an error -- see (1) below. Reading in a whole
>> chromosome takes indefinitely (or at least much longer than in
>> previous versions) -- see (2) below. I suspect these issues are not
>> specific to the BAM files I am working with but can provide test data
>> if required.
>>
>> Many thanks for your help.
>>
>> Leonard
>>
>>
>> --
>> (1) Attempts to read an empty region results in an error.
>>
>>> gr <- GRanges("22", IRanges(1000000, 2000000))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> readGAlignmentPairs(file, param = param)
>> Error in `elementMetadata<-`(x, ..., value = value) :
>>    replacement 'elementMetadata' value must be a DataTable object or NULL
>>>
>>> sessionInfo()
>> R Under development (unstable) (2013-12-03 r64376)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
>> [4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
>> [7] BiocGenerics_0.9.2       BiocInstaller_1.13.3
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.1.0   tools_3.1.0    zlibbioc_1.9.0
>>>
>>
>> ... but works fine with previous version
>>
>>> gr <- GRanges("22", IRanges(1000000, 2000000))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> readGAlignmentPairs(file, param = param)
>> GAlignmentPairs with 0 alignment pairs and 0 metadata columns:
>>   seqnames strand :    ranges --    ranges
>>      <Rle>  <Rle> : <IRanges> -- <IRanges>
>> ---
>> seqlengths:
>>            1          2          3 ... GL000247.1 GL000248.1 GL000249.1
>>    249250621  243199373  198022430 ...      36422      39786      38502
>>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
>> [4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>>
>>
>> (2) Use of the which argument covering chromosome 22 takes under one
>> minute with an earlier version of readGAlignmentPairs
>>
>>> gr <- GRanges("22", IRanges(1, 51304566))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> system.time(gap <- readGAlignmentPairs(file, param = param))
>>     user  system elapsed
>>   45.887   0.256  46.168
>>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
>> [4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>>>
>>
>> ... but at least twenty times as long with the current version.
>>
>>> gr <- GRanges("22", IRanges(1, 51304566))
>>>
>>> param <- ScanBamParam(which = gr)
>>>
>>> system.time(gap <- readGAlignmentPairs(file, param = param))
>>
>> ^C
>> Timing stopped at: 1108.041 35.998 1144.006
>>>
>>> sessionInfo()
>> R Under development (unstable) (2013-12-03 r64376)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
>> [4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
>> [7] BiocGenerics_0.9.2
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list