[Bioc-devel] Iterating over BSgenomeViews returns DNAString instead of BSgenomeViews
Hervé Pagès
hpages at fredhutch.org
Fri Apr 7 03:13:10 CEST 2017
Hi Pariksheet,
This is the expected behavior.
Some background: BSgenomeViews are list-like objects where the *list
elements* (i.e. the elements one extracts with [[) are the DNA
sequences from the views:
> views
BSgenomeViews object with 2 views and 0 metadata columns:
seqnames ranges strand dna
<Rle> <IRanges> <Rle> <DNAStringSet>
[1] chr1 [25001, 28000] * [GCTTCAGCCT...TTATTTATTG]
[2] chr2 [30001, 31000] * [GACCCTCCTG...AGCAGGTGGT]
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
> views[[1]]
3000-letter "DNAString" instance
seq:
GCTTCAGCCTGCACAGATAGGGGAGTAGGGGACAGA...CTGTCGTCTGAATTCCAAGCTTTTTGTTATTTATTG
> views[[2]]
1000-letter "DNAString" instance
seq:
GACCCTCCTGTTGGCTTTTTTACAAGACTGCTATGT...TGCTGGGACAGTCATGGGCAAACCAGAGCAGGTGGT
When subsetting with [ I get:
> views[1]
BSgenomeViews object with 1 view and 0 metadata columns:
seqnames ranges strand dna
<Rle> <IRanges> <Rle> <DNAStringSet>
[1] chr1 [25001, 28000] * [GCTTCAGCCT...TTATTTATTG]
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
> views[2]
BSgenomeViews object with 1 view and 0 metadata columns:
seqnames ranges strand dna
<Rle> <IRanges> <Rle> <DNAStringSet>
[1] chr2 [30001, 31000] * [GACCCTCCTG...AGCAGGTGGT]
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
The important difference is that with [[ I get a DNAString object
(the content of the view) and with [ I get a BSgenomeViews object
of length 1.
The semantic of lapply() is to always use [[ internally to extract
elements. This is why 'lapply(views, class)' returns "DNAString".
If you want to operate on the length-one Views objects, you need to
do:
lapply(seq_along(views), function(i) { do something on views[i] })
Hope this helps,
H.
On 04/05/2017 08:13 PM, Pariksheet Nanda wrote:
> Hi bioconductor devs,
>
> The BSgenomeViews class has been very useful in efficiently propagating
> metadata for running Biostring operations. I noticed something unexpected
> when iterating over views - it seems to return the Biostrings object
> instead of a single length Views object, and thus loses the associated view
> metadata. Is this intentional? Below is some example code, the output and
> sessionInfo(). Yes, I also confirmed this happens in the development
> version of R / bioconductor 3.5.
>
> On a side note, for unit testing it's been difficult to mock a BSgenome
> object due to the link to physical files, and as a workaround I use a
> small, arbitrary BSgenome package. Can one construct a BSgenome from its
> package bundled extdata? The man page examples use packaged genomes.
>
> Code to reproduce the issue:
>
> ----------------------------------------------------------------------
> library(BSgenome)
> genome <- getBSgenome("BSgenome.Hsapiens.UCSC.hg19")
> gr <- GRanges(c("chr1:25001-28000", "chr2:30001-31000"))
> views <- Views(genome, gr)
> views
> lapply(views, class)
> ----------------------------------------------------------------------
>
> Result:
>
> ----------------------------------------------------------------------
>> views
> BSgenomeViews object with 2 views and 0 metadata columns:
> seqnames ranges strand dna
> <Rle> <IRanges> <Rle> <DNAStringSet>
> [1] chr1 [25001, 28000] * [GCTTCAGCCT...TTATTTATTG]
> [2] chr2 [30001, 31000] * [GACCCTCCTG...AGCAGGTGGT]
> -------
> seqinfo: 93 sequences (1 circular) from hg19 genome
>> lapply(views, class)
> [[1]]
> [1] "DNAString"
> attr(,"package")
> [1] "Biostrings"
>
> [[2]]
> [1] "DNAString"
> attr(,"package")
> [1] "Biostrings"
>
>>
> ----------------------------------------------------------------------
>
> Tested against these configurations:
> 1) R 3.3.2 + BSgenome 1.42.0 (stable bioconductor 3.4)
> 2) R 2017-04-05 installed via llnl/spack + BSgenome 1.43.7 (devel
> bioconductor 3.5)
>
> sessionInfo for configuration #2 above:
> ----------------------------------------------------------------------
>> sessionInfo()
> R Under development (unstable) (2017-04-05 r72488)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.2 LTS
>
> Matrix products: default
> BLAS:
> /share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRblas.so
> LAPACK:
> /share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRlapack.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.43.7
> [3] rtracklayer_1.35.10 Biostrings_2.43.7
> [5] XVector_0.15.2 GenomicRanges_1.27.23
> [7] GenomeInfoDb_1.11.10 IRanges_2.9.19
> [9] S4Vectors_0.13.15 BiocGenerics_0.21.3
>
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.21.0 GenomicAlignments_1.11.12
> [3] BiocParallel_1.9.5 lattice_0.20-35
> [5] tools_3.5.0 SummarizedExperiment_1.5.7
> [7] grid_3.5.0 Biobase_2.35.1
> [9] matrixStats_0.52.1 Matrix_1.2-9
> [11] GenomeInfoDbData_0.99.0 bitops_1.0-6
> [13] RCurl_1.95-4.8 DelayedArray_0.1.7
> [15] compiler_3.5.0 Rsamtools_1.27.15
> [17] XML_3.98-1.6
>> BiocInstaller::biocValid()
> [1] TRUE
>>
>
> ---
> Pariksheet Nanda
> PhD Candidate in Genetics and Genomics
> System Administrator, Storrs HPC Cluster
> University of Connecticut
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=4f5sS-9Z9WKeGLCHw3KM82mea8ioHJa_FrqleAw5Otc&s=urNRGmIvz9AzopqsBNZqNtWkPlNZnm5w3mZ_QML6aes&e=
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list