[BioC] Reading paired-end data into GRangesList

Paul Leo p.leo at uq.edu.au
Thu Oct 27 05:14:44 CEST 2011


Hi Huber,
Think I would just use readBamGappedAlignments from this you will lose
the paired end info. Yes you will add the Ns as coverage but  the
difference here is very small in most cases. Note also you probably want
to use tag counts rather that bg coverage anyway...

There is another issue ( I think) ; let say for example some NNs were
due to an "insertion" in this "samples genome" compared to the
reference . When you go to normalise the signal counts per exon or
counts per bp whatever ... will you use the exon length/ genome length
for that individual or the reference exon length?  You will use the
reference obviously so its a bit grey what the true answer is...

However I believe you can get the exact coverage from the CIGAR if you
wish see... 

http://stuff.mit.edu/afs/athena/software/r/current/lib/R/library/GenomicRanges/html/cigar-utils.htm 


irl <- cigarToIRangesListByRName(bam$cigar, bam$rname, bam$pos)
irl <- irl[elementLengths(irl) != 0]
reads <- as(irl,"GRanges")
reads1 <- as(irl,"RangedData")
gl <- coverage(reads1)

probably a bit slower... however probably  a bit old now.........

The GenomicRanges package has some new documentation on
"countGenomicOverlaps" which may sort this out for you as it's designed
to make input for edgeR Deseq etc...

Cheers
Paul



-----Original Message-----
From: Hubert Rehrauer <Hubert.Rehrauer at fgcz.ethz.ch>
To: bioconductor at r-project.org <bioconductor at r-project.org>
Subject: [BioC] Reading paired-end data into GRangesList
Date: Wed, 26 Oct 2011 23:51:47 +1000

Hi

I want to load paired-end data from Bam-Files into R in order to do 
expression counting. The complicating thing is, that the first read was 
aligned using a gapped alignment (i.e. the cigar string contains Ns).

How can this be done? I thought this would be a quite common task but I 
did not find any function that would do this. Neither scanBam nor 
readBamGappedAlignments are directly helpful with this.

For me the most obvious thing would be to hold the alignment of such a 
read as a GRangesList. In order to get there I would use scanBam to load 
the first read. Parse the  cigar string to identify the gaps, build a 
GRangesList and then add the alignment of the second read to the List. 
Do you have any better ideas?


best regards,
hubert

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list