[Bioc-devel] applyPileup for introns with junction read

Martin Morgan mtmorgan at fhcrc.org
Mon Sep 12 18:53:50 CEST 2011


Hi Tengfei --

On 09/11/2011 01:29 PM, Tengfei Yin wrote:
> Dear all,
>
> I apologize If this question goes to the wrong mailing list.

The Bioconductor mailing is probably more appropriate; your question is 
about using a package, rather than developing a package. In terms of 
your bug report...

> I am trying to finishing a function in my package that compute the mismatch
> summary, when it's exon regions, it looks like it matches well, but I don't
> know how to get the right count in the intron region, if there are some
> junction read over it, I guess I should get something for N and 0 for ACTG.
> But I get some weird result for that region
>
> Browse[1]>        seq
>    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
> A    4    4    4    4    4    4    4    4    4     4     4
> C    0    0    0    0    0    0    0    0    0     0     0
> G    0    0    0    0    0    0    0    0    0     0     0
> T    2    2    2    2    2    2    2    2    2     2     2
> N    0    0    0    0    0    0    0    0    0     0     0
>
> Part of the  code I use  looks like
> pileupAsGRanges<- function(bams, regions,
>                              DNABases = c("A", "C", "G", "T", "N"),
>                              ...) {
>    pileupFiles<- PileupFiles(bams)
>    pileupParams<- PileupParam(which = regions, ...)
>   ..........
>    pileupFun<- function(x) {
>           seq<- x$seq[DNABases,1,]
> ...
> }
> ........
> grlst<- applyPileups(pileupFiles, pileupFun, param = pileupParams)
> ...
> }
>
> I cannot give a reproducible example since it's based on particular Bam file

By way of reproducible example, I made a trivial SAM file and converted 
it to bam with Rsamtools::asBam

Rsamtools::applyPileups was handling indels and reference skips 
incorrectly; this likely influenced your other regions, too, and users 
should re-do their analysis.

The bug is fixed in Rsamtools 1.5.59

Thanks for the report!

Martin

> on my laptop. But I test with IGV, it doesn't show any coverage over that
> region, then I see if I use scanBam to get seqs for that region I have 6
> reads, which has CIGAR like "35M10939N40M", but that shouldn't give all
> A or T like the result above, by the way, there is "simpleCigar" in
> ScanBamParam which can filter these reads, but there is no such argument in
> PileupParam.
>
> I don't know if this is caused by my mistake or inappropriate use of
> arguments, or what's the potential reason for that result.
>
> Thanks in advance.
>
> Tengfei
>
>   sessionInfo()
> R Under development (unstable) (2011-08-08 r56671)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Rsamtools_1.5.57     Biostrings_2.21.9    GenomicRanges_1.5.36
> [4] IRanges_1.11.26      rtracklayer_1.13.13  RCurl_1.6-10
> [7] bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.21.3 tools_2.14.0    XML_3.4-2       zlibbioc_0.1.7
>
>
>
>
>
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list