[BioC] Gviz - show plus and minus strand from a bam

Hahne, Florian florian.hahne at novartis.com
Fri Nov 15 15:35:00 CET 2013


Hi Antonio,
the default BAM import function used in the Gviz package is not
particularly smart. It ignores things like strand, etc. However the
package was designed with a certain amount of flexibility in mind, and you
can make a lot of things happening by employing user defined functions. In
this case, you want to change the BAM import function in order to produce
a DataTrack with two "samples", the forward and the reverse strand. The
following code should get you to this, or at least fairly close, and you
may want to improve on that:

library(Gviz)
library(Rsamtools)
strandedBamImport <- function (file, selection) {
    if (!file.exists(paste(file, "bai", sep = ".")))
        stop("Unable to find index for BAM file '", file, "'. You can
build an index using the following command:\n\t",
             "library(Rsamtools)\n\tindexBam(\"", file, "\")")
    sinfo <- scanBamHeader(file)[[1]]
    res <- if (!as.character(seqnames(selection)[1]) %in%
               names(sinfo$targets)) {
        mcols(selection) <- DataFrame(score = 0)
        selection
    }else {
        param <- ScanBamParam(what = c("pos", "qwidth", "strand"),
                              which = selection, flag =
scanBamFlag(isUnmappedQuery = FALSE))
        x <- scanBam(file, param = param)[[1]]
        gr <- GRanges(strand=x[["strand"]], ranges=IRanges(x[["pos"]],
width = x[["qwidth"]]), seqnames=seqnames(selection)[1])
        grs <- split(gr, strand(gr))
        cov <- lapply(grs[c("+", "-")], function(y) coverage(ranges(y),
width=end(selection)))
        pos <- sort(unique(unlist(lapply(cov, function(y) c(start(y),
end(y))))))
        if(length(pos)==0){
            mcols(selection) <- DataFrame(plus=0, minus=0)
            selection
        }else{
            GRanges(seqnames = seqnames(selection)[1],
ranges=IRanges(start=head(pos, -1), end=tail(pos, -1)),
                    plus=as.numeric(cov[["+"]][head(pos, -1)]),
                    minus=-as.numeric(cov[["-"]][head(pos, -1)]))
        }
    }
    return(res)
}


bamFile <- system.file("extdata/test.bam", package = "Gviz")
dTrack <- DataTrack(range=bamFile, genome="hg19", name="Coverage",
window=-1, chromosome="chr1", importFunction=strandedBamImport,
stream=TRUE)
plotTracks(dTrack, from = 189990000, to = 190000000, col=c("red", "blue"),
groups=c("+", "-"), type="hist", col.histogram=NA)



Please note that the coverage for the reverse strand reads is somewhat
arbitrarily set to negative values in order to mirror the coverage around
0. Not perfect, but still usable I hope.
Florian





On 11/14/13 10:02 PM, "António domingues" <amjdomingues at gmail.com> wrote:

>Dear Florian and bioconductor list,
>
>I have been using Gviz with quite some success but now I need to
>generate some plots in which I can represent (by color or direction) the
>strandness of the read. So far I am generating coverage plots from
>RNA-seq and ChiP-seq data straight from the bam file using the following
>methods:
>
>## generates coverage plot
>bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome =
>chr,  name = "RNA-seq", type = "histogram", col.histogram= "#377EB8",
>fill="#377EB8")
>
>## plot with individual reads
>    bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome =
>chr,  name = "RNA-seq")
>
>Unfortunately I could not find in either of those an option to
>differentiate reads in sense and anti-sense strand. In an ideal world I
>would represent in the 1st plot the coverage for the plus strand above a
>line and for the minus strand below a line. For the second option
>coloring the reads in the sense and antisense with different colors.
>Either option would be fine.
>
>I read the manual and searched the confines of the internet, and for the
>life of me could not find a solution - except separating the sense and
>anti-sense reads in different files. Is there a more elegant way of
>doing this?
>
>Cheers,
>António
>
> > sessionInfo()
>R version 3.0.2 (2013-09-25)
>Platform: x86_64-pc-linux-gnu (64-bit)
>
>locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C LC_TIME=en_US.UTF-8
>LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>attached base packages:
>[1] parallel  grid      stats     graphics  grDevices utils datasets
>methods   base
>
>other attached packages:
>  [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0
>GenomicRanges_1.14.3   XVector_0.2.0
>  [6] IRanges_1.20.4         BiocGenerics_0.8.0 Gviz_1.6.0
>RSvgDevice_0.6.4.3     RColorBrewer_1.0-5
>[11] data.table_1.8.10      scales_0.2.3 ggplot2_0.9.3.1
>reshape_0.8.4          plyr_1.8
>
>loaded via a namespace (and not attached):
>  [1] biomaRt_2.18.0      Biostrings_2.30.0   biovizBase_1.10.0
>bitops_1.0-6        BSgenome_1.30.0     cluster_1.14.4
>  [7] colorspace_1.2-4    DBI_0.2-7           dichromat_2.0-0
>digest_0.6.3        gtable_0.1.2        Hmisc_3.12-2
>[13] labeling_0.2        lattice_0.20-24     latticeExtra_0.6-26
>MASS_7.3-29         munsell_0.4.2       proto_0.3-10
>[19] RCurl_1.95-4.1      reshape2_1.2.2      rpart_4.1-3
>Rsamtools_1.14.1    RSQLite_0.11.4      rtracklayer_1.22.0
>[25] stats4_3.0.2        stringr_0.6.2       tools_3.0.2
>XML_3.98-1.1        zlibbioc_1.8.0
>
>-- 
>António Miguel de Jesus Domingues, PhD
>Postdoctoral researcher
>Deep Sequencing Group - SFB655
>Biotechnology Center (Biotec)
>Technische Universität Dresden
>Fetscherstraße 105
>01307 Dresden
>
>Phone: +49 (351) 458 82362
>Email: antonio.domingues(at)biotec.tu-dresden.de
>--
>The Unbearable Lightness of Molecular Biology
>



More information about the Bioconductor mailing list