[BioC] How to plot NGS data?

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Feb 17 17:50:08 CET 2012


Hi Florian,

On Fri, Feb 17, 2012 at 11:00 AM, Hahne, Florian
<florian.hahne at novartis.com> wrote:
> Just to chime in here:
> High up on my list of future developments is some sort of file-based track
> class, where all the genomic regions reside on disc in an indexed file,
> like BAM, bigWig or tabix. The actual ranges are only realized within R in
> the plotting method, so no need to fill the memory with unnecessary
> clatter. With the available infrastructure in Rsamtools this should be an
> easy extension, I just need to find some time to hack in the code. I guess
> some sort of NGS-specific visualization would be the next thing on the
> list. There is an experimental AlignedReadsTrack class, but right now
> that's really just a huge collection of bugs :-(

I hacked a few "track-like" objects to use with GenomeGraphs some time
ago in order to plot data from an rna-seq protocol we've been
developing. The pictures look like this:

http://cbio.mskcc.org/~lianos/files/bioconductor/DEPDC1.png

"That's some bizarre RNA-seq data." you might say, but we only capture
3' ends of mRNAs in order to study alternative cleavage and
polyadenylation.

To do that, though, those lanes (above the genome axis) are probably
something like the AlignedReadsTrack class you mention, which are
built by working over a specified range of a BAM files, or by
Rle(coverage) vectors.

These coverage vectors are also smoothed using another package I'm
whipping up which does (probably not very efficiently written)
convolutions over Rle(coverage) vectors directly, which might be
useful:

https://github.com/lianos/biosignals/blob/master/R/convolve1d.R

I'd also like to add "stranded" visualization ability, ie - plot (+)
coverage north of 0 and (-) south, like you've already implemented by
the looks of vignette.

Lastly, the tails of the gene models you see below are pulling the
models out of some local cache that was build by the info cached from
local TranscriptDb objects ... this needs to be redone to use
something like tabix or BAM stored gene model-like functionality (mine
currently isn't particularly efficient at all).

All this is to say that I have some things whipped together that might
be useful in this realm and would be happy to help w/ this, too ...
I'd totally love to switch to Gviz eventually when there is time to do
so, and would happy to help if you haven't already done it.

In that regard, if you actually *want* help with that, maybe you could
ping (maybe on bioc-devel?) us when you think you might be ready to
start tackling this part of Gviz?

Anyway -- although I haven't used it, Gviz looks incredibly
impressive, nice work!

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list