[BioC] Reading paired-end data into GRangesList
barr.cory at gene.com
Fri Oct 28 18:16:58 CEST 2011
readBamGappedAlignments will retain read names if the use.names arg is
set to TRUE. In general, I find grglist more high-level and
convenient to use than cigarToIRangesListByRName.
On Fri, Oct 28, 2011 at 8:56 AM, Hubert Rehrauer
<Hubert.Rehrauer at fgcz.ethz.ch> wrote:
> Hi Paul
> thanks. Yes, the readBamGappedAlignmenbts will not allow me to pair the
> reads after loading because the read-names will be discarded.
> I will try, the cigarToIRangesListByRName. I had overlooked this option, but
> I guess it's gonna be slow.
> The issue about length normalization is not really problem, since the reason
> why I have gaps on the genome is that the reads have been mapped to the
> transcriptome, without gaps, and then projected to the genome. So all gaps
> are due to introns.
> Paul Leo wrote:
>> Hi Huber,
>> Think I would just use readBamGappedAlignments from this you will lose
>> the paired end info. Yes you will add the Ns as coverage but the
>> difference here is very small in most cases. Note also you probably want
>> to use tag counts rather that bg coverage anyway...
>> There is another issue ( I think) ; let say for example some NNs were
>> due to an "insertion" in this "samples genome" compared to the
>> reference . When you go to normalise the signal counts per exon or
>> counts per bp whatever ... will you use the exon length/ genome length
>> for that individual or the reference exon length? You will use the
>> reference obviously so its a bit grey what the true answer is...
>> However I believe you can get the exact coverage from the CIGAR if you
>> wish see...
>> irl <- cigarToIRangesListByRName(bam$cigar, bam$rname, bam$pos)
>> irl <- irl[elementLengths(irl) != 0]
>> reads <- as(irl,"GRanges")
>> reads1 <- as(irl,"RangedData")
>> gl <- coverage(reads1)
>> probably a bit slower... however probably a bit old now.........
>> The GenomicRanges package has some new documentation on
>> "countGenomicOverlaps" which may sort this out for you as it's designed
>> to make input for edgeR Deseq etc...
>> -----Original Message-----
>> From: Hubert Rehrauer <Hubert.Rehrauer at fgcz.ethz.ch>
>> To: bioconductor at r-project.org <bioconductor at r-project.org>
>> Subject: [BioC] Reading paired-end data into GRangesList
>> Date: Wed, 26 Oct 2011 23:51:47 +1000
>> I want to load paired-end data from Bam-Files into R in order to do
>> expression counting. The complicating thing is, that the first read was
>> aligned using a gapped alignment (i.e. the cigar string contains Ns).
>> How can this be done? I thought this would be a quite common task but I
>> did not find any function that would do this. Neither scanBam nor
>> readBamGappedAlignments are directly helpful with this.
>> For me the most obvious thing would be to hold the alignment of such a
>> read as a GRangesList. In order to get there I would use scanBam to load the
>> first read. Parse the cigar string to identify the gaps, build a
>> GRangesList and then add the alignment of the second read to the List. Do
>> you have any better ideas?
>> best regards,
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> Search the archives:
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
More information about the Bioconductor