[Bioc-devel] readGAlignmentPairs function broken?

Leonard Goldstein goldstein.leonard at gene.com
Thu Dec 19 21:58:13 CET 2013


Hi Hervé

Thanks for fixing this problem. It works fine now, however I did
notice a drop in performance. For example reading in chromosome 1 can
take about twice as long as previously (see below). Is this something
that can be avoided?

Many thanks again

Leonard

--
GenomicAlignments 0.99.8

> param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
>
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
161.282   7.812 169.442
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
 91.614   7.256  99.065
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
 89.285   7.461  96.940
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.8 Rsamtools_1.15.15        Biostrings_2.31.5
[4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.17
[7] BiocGenerics_0.9.2

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>

GenomicAlignments 0.99.10

> param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621)))
>
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
265.624   8.812 274.990
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
249.724   7.177 257.399
> system.time(readGAlignmentPairs(file, param = param))
   user  system elapsed
247.476   6.648 254.621
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.10 Rsamtools_1.15.15
[3] Biostrings_2.31.5         GenomicRanges_1.15.17
[5] XVector_0.3.5             IRanges_1.21.17
[7] BiocGenerics_0.9.2

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>

On Thu, Dec 19, 2013 at 11:52 AM, Hervé Pagès <hpages at fhcrc.org> wrote:
> Hi Leonard,
>
> This should be fixed in GenomicAlignments 0.99.10, which will become
> available thru biocLite() in the next 24 hours or so. In the mean time
> you can grab it directly from svn:
>
>
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicAlignments
>
> Please let me know if you still run into problems with this.
> Thanks!
>
>
> H.
>
> On 12/19/2013 10:41 AM, Leonard Goldstein wrote:
>>
>> Hi Hervé
>>
>> You probably spotted this already but it looks like the problem is
>> introduced between GenomicAlignments revisions r84052 (0.99.8) and
>> r84106 (0.99.9)
>>
>> Best wishes
>>
>> Leonard
>>
>>
>> On Wed, Dec 18, 2013 at 5:25 PM, Leonard Goldstein <goldstel at gene.com>
>> wrote:
>>>
>>> Dear list,
>>>
>>> There seems to be a problem with the readGAlignmentPairs function:
>>>
>>> Querying genomic regions without any alignments using the which
>>> argument results in an error -- see (1) below. Reading in a whole
>>> chromosome takes indefinitely (or at least much longer than in
>>> previous versions) -- see (2) below. I suspect these issues are not
>>> specific to the BAM files I am working with but can provide test data
>>> if required.
>>>
>>> Many thanks for your help.
>>>
>>> Leonard
>>>
>>>
>>> --
>>> (1) Attempts to read an empty region results in an error.
>>>
>>>> gr <- GRanges("22", IRanges(1000000, 2000000))
>>>>
>>>> param <- ScanBamParam(which = gr)
>>>>
>>>> readGAlignmentPairs(file, param = param)
>>>
>>> Error in `elementMetadata<-`(x, ..., value = value) :
>>>    replacement 'elementMetadata' value must be a DataTable object or NULL
>>>>
>>>>
>>>> sessionInfo()
>>>
>>> R Under development (unstable) (2013-12-03 r64376)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
>>> [4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
>>> [7] BiocGenerics_0.9.2       BiocInstaller_1.13.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] bitops_1.0-6   stats4_3.1.0   tools_3.1.0    zlibbioc_1.9.0
>>>>
>>>>
>>>
>>> ... but works fine with previous version
>>>
>>>> gr <- GRanges("22", IRanges(1000000, 2000000))
>>>>
>>>> param <- ScanBamParam(which = gr)
>>>>
>>>> readGAlignmentPairs(file, param = param)
>>>
>>> GAlignmentPairs with 0 alignment pairs and 0 metadata columns:
>>>   seqnames strand :    ranges --    ranges
>>>      <Rle>  <Rle> : <IRanges> -- <IRanges>
>>> ---
>>> seqlengths:
>>>            1          2          3 ... GL000247.1 GL000248.1 GL000249.1
>>>    249250621  243199373  198022430 ...      36422      39786      38502
>>>>
>>>>
>>>> sessionInfo()
>>>
>>> R version 3.0.0 (2013-04-03)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=C                 LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
>>> [4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>>>
>>>
>>> (2) Use of the which argument covering chromosome 22 takes under one
>>> minute with an earlier version of readGAlignmentPairs
>>>
>>>> gr <- GRanges("22", IRanges(1, 51304566))
>>>>
>>>> param <- ScanBamParam(which = gr)
>>>>
>>>> system.time(gap <- readGAlignmentPairs(file, param = param))
>>>
>>>     user  system elapsed
>>>   45.887   0.256  46.168
>>>>
>>>>
>>>> sessionInfo()
>>>
>>> R version 3.0.0 (2013-04-03)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=C                 LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
>>> [4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>>>>
>>>>
>>>
>>> ... but at least twenty times as long with the current version.
>>>
>>>> gr <- GRanges("22", IRanges(1, 51304566))
>>>>
>>>> param <- ScanBamParam(which = gr)
>>>>
>>>> system.time(gap <- readGAlignmentPairs(file, param = param))
>>>
>>>
>>> ^C
>>> Timing stopped at: 1108.041 35.998 1144.006
>>>>
>>>>
>>>> sessionInfo()
>>>
>>> R Under development (unstable) (2013-12-03 r64376)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
>>> [4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
>>> [7] BiocGenerics_0.9.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>>>>
>>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319



More information about the Bioc-devel mailing list