[Bioc-devel] Iterating over BSgenomeViews returns DNAString instead of BSgenomeViews
Pariksheet Nanda
pariksheet.nanda at uconn.edu
Thu Apr 6 05:13:39 CEST 2017
Hi bioconductor devs,
The BSgenomeViews class has been very useful in efficiently propagating
metadata for running Biostring operations. I noticed something unexpected
when iterating over views - it seems to return the Biostrings object
instead of a single length Views object, and thus loses the associated view
metadata. Is this intentional? Below is some example code, the output and
sessionInfo(). Yes, I also confirmed this happens in the development
version of R / bioconductor 3.5.
On a side note, for unit testing it's been difficult to mock a BSgenome
object due to the link to physical files, and as a workaround I use a
small, arbitrary BSgenome package. Can one construct a BSgenome from its
package bundled extdata? The man page examples use packaged genomes.
Code to reproduce the issue:
----------------------------------------------------------------------
library(BSgenome)
genome <- getBSgenome("BSgenome.Hsapiens.UCSC.hg19")
gr <- GRanges(c("chr1:25001-28000", "chr2:30001-31000"))
views <- Views(genome, gr)
views
lapply(views, class)
----------------------------------------------------------------------
Result:
----------------------------------------------------------------------
> views
BSgenomeViews object with 2 views and 0 metadata columns:
seqnames ranges strand dna
<Rle> <IRanges> <Rle> <DNAStringSet>
[1] chr1 [25001, 28000] * [GCTTCAGCCT...TTATTTATTG]
[2] chr2 [30001, 31000] * [GACCCTCCTG...AGCAGGTGGT]
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
> lapply(views, class)
[[1]]
[1] "DNAString"
attr(,"package")
[1] "Biostrings"
[[2]]
[1] "DNAString"
attr(,"package")
[1] "Biostrings"
>
----------------------------------------------------------------------
Tested against these configurations:
1) R 3.3.2 + BSgenome 1.42.0 (stable bioconductor 3.4)
2) R 2017-04-05 installed via llnl/spack + BSgenome 1.43.7 (devel
bioconductor 3.5)
sessionInfo for configuration #2 above:
----------------------------------------------------------------------
> sessionInfo()
R Under development (unstable) (2017-04-05 r72488)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS:
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRblas.so
LAPACK:
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.43.7
[3] rtracklayer_1.35.10 Biostrings_2.43.7
[5] XVector_0.15.2 GenomicRanges_1.27.23
[7] GenomeInfoDb_1.11.10 IRanges_2.9.19
[9] S4Vectors_0.13.15 BiocGenerics_0.21.3
loaded via a namespace (and not attached):
[1] zlibbioc_1.21.0 GenomicAlignments_1.11.12
[3] BiocParallel_1.9.5 lattice_0.20-35
[5] tools_3.5.0 SummarizedExperiment_1.5.7
[7] grid_3.5.0 Biobase_2.35.1
[9] matrixStats_0.52.1 Matrix_1.2-9
[11] GenomeInfoDbData_0.99.0 bitops_1.0-6
[13] RCurl_1.95-4.8 DelayedArray_0.1.7
[15] compiler_3.5.0 Rsamtools_1.27.15
[17] XML_3.98-1.6
> BiocInstaller::biocValid()
[1] TRUE
>
---
Pariksheet Nanda
PhD Candidate in Genetics and Genomics
System Administrator, Storrs HPC Cluster
University of Connecticut
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list