[Bioc-devel] readGAlignmentPairs function broken?

Leonard Goldstein goldstein.leonard at gene.com
Thu Dec 19 02:25:04 CET 2013


Dear list,

There seems to be a problem with the readGAlignmentPairs function:

Querying genomic regions without any alignments using the which
argument results in an error -- see (1) below. Reading in a whole
chromosome takes indefinitely (or at least much longer than in
previous versions) -- see (2) below. I suspect these issues are not
specific to the BAM files I am working with but can provide test data
if required.

Many thanks for your help.

Leonard


--
(1) Attempts to read an empty region results in an error.

> gr <- GRanges("22", IRanges(1000000, 2000000))
>
> param <- ScanBamParam(which = gr)
>
> readGAlignmentPairs(file, param = param)
Error in `elementMetadata<-`(x, ..., value = value) :
  replacement 'elementMetadata' value must be a DataTable object or NULL
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
[4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
[7] BiocGenerics_0.9.2       BiocInstaller_1.13.3

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   tools_3.1.0    zlibbioc_1.9.0
>

... but works fine with previous version

> gr <- GRanges("22", IRanges(1000000, 2000000))
>
> param <- ScanBamParam(which = gr)
>
> readGAlignmentPairs(file, param = param)
GAlignmentPairs with 0 alignment pairs and 0 metadata columns:
 seqnames strand :    ranges --    ranges
    <Rle>  <Rle> : <IRanges> -- <IRanges>
---
seqlengths:
          1          2          3 ... GL000247.1 GL000248.1 GL000249.1
  249250621  243199373  198022430 ...      36422      39786      38502
>
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
[4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0


(2) Use of the which argument covering chromosome 22 takes under one
minute with an earlier version of readGAlignmentPairs

> gr <- GRanges("22", IRanges(1, 51304566))
>
> param <- ScanBamParam(which = gr)
>
> system.time(gap <- readGAlignmentPairs(file, param = param))
   user  system elapsed
 45.887   0.256  46.168
>
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] Rsamtools_1.14.2     Biostrings_2.30.1    GenomicRanges_1.14.4
[4] XVector_0.2.0        IRanges_1.20.6       BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.0.0   zlibbioc_1.8.0
>

... but at least twenty times as long with the current version.

> gr <- GRanges("22", IRanges(1, 51304566))
>
> param <- ScanBamParam(which = gr)
>
> system.time(gap <- readGAlignmentPairs(file, param = param))

^C
Timing stopped at: 1108.041 35.998 1144.006
>
> sessionInfo()
R Under development (unstable) (2013-12-03 r64376)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.9 Rsamtools_1.15.15        Biostrings_2.31.5
[4] GenomicRanges_1.15.17    XVector_0.3.5            IRanges_1.21.16
[7] BiocGenerics_0.9.2

loaded via a namespace (and not attached):
[1] bitops_1.0-6   stats4_3.1.0   zlibbioc_1.9.0
>



More information about the Bioc-devel mailing list