[Bioc-sig-seq] coverage vector longer than covered area (ScanBam and IRanges)?

Francesco Lescai f.lescai at ucl.ac.uk
Tue Sep 27 21:53:45 CEST 2011


Hi there,
may be I am missing something obvious in my code.
I extracted information from a bam file on 11 regions.
when I calculate the coverage on one of these regions, I get a vector which is much longer than the number of bases of the region itself.

I tried several times to replicate this problem on a small IRanges object but - of course - everything was ok.
Anyone has an idea?

that's what I've done:

> regions
   chr     start       end
1    1 195915909 197867662
2    2   1199920   2234892 [...]

scanned my bam within those regions
which=GRanges(regions$chr,IRanges(regions$start,regions$end))
ggf<-scanBam("s_7_1_sequence.txt.novo.rmdup.bam_sorted.bam", 
             param=ScanBamParam(which=which,
                                what = c("pos", "qwidth"),
                                flag =scanBamFlag(isUnmappedQuery =FALSE)))

created a summary function, as suggested in the IRanges vignette
summaryFunction<-function(seqname,bamfile,...){
  x<-bamfile[[seqname]]
  coverage(IRanges(x[["pos"]],width=x[["qwidth"]]))
}

applied to get a list of coverage for each region
seqnames<-names(ggf)
cvg<-lapply(seqnames,summaryFunction,ggf)
names(cvg)=seqnames

then I extracted one of those regions, and computed the vector coverage
areacov<-cvg[["14:20404091-35890861"]]
areacov.vect<-as.vector(areacov)

however
positions=seq(from=20404091,to=35890861,by=1)

> areacov
'integer' Rle of length 35890857 with 445958 runs
  Lengths: 20404015        1        1        1 ...       25      121       76
  Values :        0        2        5        6 ...        1        0        1
> length(positions)
[1] 15486771

as you can see from here, the Rle has a length of 35890857 
even considering that my area is a bit larger, 'cause it gets the reads positions
> min(ggf[["14:20404091-35890861"]][["pos"]])
[1] 20404016
> max(ggf[["14:20404091-35890861"]][["pos"]])
[1] 35890782

I still have a coverage vector much longer then expected.
I am surely doing something wrong, but the procedure seemed straigthforward.

Any clue to help?

cheers,
Francesco


-------------------------------------- 
Francesco Lescai, PhD 
Research Associate in Genome Analysis 
Faculty of Biomedical Sciences, Division of Research Strategy 
c/o UCL Cancer Institute, Paul O'Gorman Building 
University College London 
Gower Street WC1E 6BT London, UK 

visiting address: 72, Huntley Street 
WC1E 6DE London 

email: f.lescai at ucl.ac.uk 
phone: +44.(0)20.7679.6812 
[ext: 46812] 
-------------------------------------- 


More information about the Bioc-sig-sequencing mailing list