[BioC] readGappedAlignments and param argument
Martin Morgan
mtmorgan at fhcrc.org
Mon Sep 9 22:53:52 CEST 2013
Hi Elena --
On 09/09/2013 03:57 AM, Elena Grassi wrote:
> Hello,
>
> I've stumbled across a strange behaviour when indexing bams and then
> reading them with ReadGappedAlignments with a param
> holding a which section.
> Given this bam:
> data at tungsteno:/rogue_bis/data/bioinfotree/prj/roar/dataset/0.2/roar_test_2$
> samtools view test.bam
> D1JP1ACXX130107:2:2201:7003:63880 16 chr1 243675625
> 255 92M * 0 0 * *
>
>> library(GenomicRanges)
>> bam <- "/scarlet/data/bioinfotree/prj/roar/dataset/0.2/roar_test_2/test.bam"
>> ga <- readGappedAlignments(bam)
>> ga
> GappedAlignments with 1 alignment and 0 metadata columns:
> seqnames strand cigar qwidth start end
> width ngap
> <Rle> <Rle> <character> <integer> <integer> <integer>
> <integer> <integer>
> [1] chr1 - 92M 92 243675625 243675716
> 92 0
> ---
> seqlengths:
> chr1 chr2 ...
> chrUn_gl000226 chr18_gl000207_random
> 249250621 243199373 ...
> 15008 4262
>
> Everything is fine here. But if index it and load it with a param:
>> gene_id <- c("A", "B")
>> features <- GRanges(
> + seqnames = Rle(c("chr1", "chr1")),
> + strand = strand(c("-", "-")),
> + ranges = IRanges(
> + start=c(243675625, 243675627),
> + width=c(2, 100)),
> + DataFrame(gene_id)
> + )
>> indexBam(bam)
> /tmp/RtmpVbmBLu/filead32d37188.bam
> "/tmp/RtmpVbmBLu/filead32d37188.bam.bai"
>> ga_s_p <- readGappedAlignments(bam, param=ScanBamParam(which=features))
>> ga_s_p
> GappedAlignments with 2 alignments and 0 metadata columns:
> seqnames strand cigar qwidth start end
> width ngap
> <Rle> <Rle> <character> <integer> <integer> <integer>
> <integer> <integer>
> [1] chr1 - 92M 92 243675625 243675716
> 92 0
> [2] chr1 - 92M 92 243675625 243675716
> 92 0
> ---
> seqlengths:
> chr1 chr2 ...
> chrUn_gl000226 chr18_gl000207_random
> 249250621 243199373 ...
> 15008 4262
>>
>
> The single read is duplicated...I'm not an expert with these libraries
Your GRanges query is analogous to asking for (something like, maybe I didn't
get the indexing correct)
samtools view <input.bam> chr1:243675625-243675622 chr1:243675627-243675727
which performs two independent queries -- a read overlapping both regions will
be returned twice. You could reduce(features) to make ranges non-overlapping,
although it's still possible to have a single read overlapping two distinct
regions hence returned twice. Or you could use findOverlaps() and manipulate the
returned object to implement your scheme for selecting reads, or identify the
appropriate mode in summarizeOverlaps for doing the counting. Or perhaps you can
provide more of an explanation for the problem that led you to this?
Hope that helps.
Martin
> so maybe this is my fault (if I remove
> the second entry in features everything works, but from the
> ScanBamParam man about which I was not expecting
> this behaviour).
> Sorry for the convoluted example but I've stumbled on this problem
> while comparing counts obtained
> with summarizeOverlaps in a script of mine.
>
> Here's my session info:
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] rtracklayer_1.20.2 Rsamtools_1.12.3 Biostrings_2.28.0
> [4] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5 BSgenome_1.28.0 RCurl_1.95-4.1 stats4_3.0.1
> [5] tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0
>
>
> Thanks,
> E.G.
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list