[Bioc-devel] Iterating over BSgenomeViews returns DNAString instead of BSgenomeViews

Pariksheet Nanda pariksheet.nanda at uconn.edu
Thu Apr 13 05:24:15 CEST 2017


On Fri, Apr 7, 2017 at 1:13 AM, Hervé Pagès <hpages at fredhutch.org> wrote:
>
> This is the expected behavior.
>
> Some background: BSgenomeViews are list-like objects where the *list
> elements* (i.e. the elements one extracts with [[) are the DNA
> sequences from the views
--snip--
> The important difference is that with [[ I get a DNAString object
> (the content of the view) and with [ I get a BSgenomeViews object
> of length 1.

Thank you, Hervé!

I was failing to make the connection with the `[[` accessor.


On Fri, Apr 7, 2017 at 1:16 AM, Michael Lawrence <lawrence.michael at gene.com>
wrote:
>
> I'm curious as to why you are looping over the views in the first
> place. Maybe we could arrive at a vectorized solution, which is often
> but not always simpler and faster.

Hi Michael!

Broad background is I'm acculturating an undergraduate student to writing a
bioconductor package and applying software engineering practices of version
control, unit testing, documenting, dependency setup and validation in a
different environment on our university HPC cluster, etc.  The student also
came along to LibrePlanet to better understand the culture of software
freedom :o)  The package goal is to use Biostrings to look for repeating
DNA sequences of a fixed kmer size and subset to portions of the genome
without repeats (an aligner can do this ofc, but the goal is to teach R and
engineering practices).

I appreciate your thoughtfulness for vectorizing the code to best use
BSgenomeViews, but please don't spend more than 10 minutes as I have to
balance changes to the code with the student's learning and coding "voice"
and may not do proper justice for more of your effort.  My slowness to
reply was getting the project further along to be more understandable.
Here was the line which I've updating as Hervé suggested to use seq_along():
https://github.com/coregenomics/kmap/blob/4adaed6b8007e8ea39f39ff57a42a821445d3d46/R/BiostringsProjectNEW.R#L185
(I'm having a hard time thinking of how to summarizing a small example out
of context).
Although in that line ranges_hits() is only operating on single indices,
ranges_hits() was written to process groups of indices to reduce
multi-processor communication.  Generating such sets of indices would
involve applying width() to the views inside mappable() to break in into
chunks of, say, a million bases for matchPDict().  Again, I'm linking to
the code for anything that stands out at you, but I will feel bad if you
spend a lot of time on it.


> H.

> Michael

Pariksheet

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list