[BioC] Reading paired-end data into GRangesList

Fri Oct 28 17:56:48 CEST 2011

Hi Paul

thanks. Yes, the readBamGappedAlignmenbts will not allow me to pair the 
reads after loading because the read-names will be discarded.

I will try, the cigarToIRangesListByRName. I had overlooked this option, 
but I guess it's gonna be slow.

The issue about length normalization is not really problem, since the 
reason why I have gaps on the genome is that the reads have been mapped 
to the transcriptome, without gaps, and then projected to the genome. So 
all gaps are due to introns.

regards,
hubert

Paul Leo wrote:
> Hi Huber,
> Think I would just use readBamGappedAlignments from this you will lose
> the paired end info. Yes you will add the Ns as coverage but  the
> difference here is very small in most cases. Note also you probably want
> to use tag counts rather that bg coverage anyway...
>
> There is another issue ( I think) ; let say for example some NNs were
> due to an "insertion" in this "samples genome" compared to the
> reference . When you go to normalise the signal counts per exon or
> counts per bp whatever ... will you use the exon length/ genome length
> for that individual or the reference exon length?  You will use the
> reference obviously so its a bit grey what the true answer is...
>
> However I believe you can get the exact coverage from the CIGAR if you
> wish see... 
>
> http://stuff.mit.edu/afs/athena/software/r/current/lib/R/library/GenomicRanges/html/cigar-utils.htm 
>
>
> irl <- cigarToIRangesListByRName(bam$cigar, bam$rname, bam$pos)
> irl <- irl[elementLengths(irl) != 0]
> reads <- as(irl,"GRanges")
> reads1 <- as(irl,"RangedData")
> gl <- coverage(reads1)
>
> probably a bit slower... however probably  a bit old now.........
>
> The GenomicRanges package has some new documentation on
> "countGenomicOverlaps" which may sort this out for you as it's designed
> to make input for edgeR Deseq etc...
>
> Cheers
> Paul
>
>
>
> -----Original Message-----
> From: Hubert Rehrauer <Hubert.Rehrauer at fgcz.ethz.ch>
> To: bioconductor at r-project.org <bioconductor at r-project.org>
> Subject: [BioC] Reading paired-end data into GRangesList
> Date: Wed, 26 Oct 2011 23:51:47 +1000
>
> Hi
>
> I want to load paired-end data from Bam-Files into R in order to do 
> expression counting. The complicating thing is, that the first read was 
> aligned using a gapped alignment (i.e. the cigar string contains Ns).
>
> How can this be done? I thought this would be a quite common task but I 
> did not find any function that would do this. Neither scanBam nor 
> readBamGappedAlignments are directly helpful with this.
>
> For me the most obvious thing would be to hold the alignment of such a 
> read as a GRangesList. In order to get there I would use scanBam to load 
> the first read. Parse the  cigar string to identify the gaps, build a 
> GRangesList and then add the alignment of the second read to the List. 
> Do you have any better ideas?
>
>
> best regards,
> hubert
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>