[Bioc-sig-seq] Calculating RPKMs from Tophat results : Your advice requested

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Apr 28 21:47:40 CEST 2010


Hi,

On Wed, Apr 28, 2010 at 3:12 PM, Pratap, Abhishek
<APratap at som.umaryland.edu> wrote:
> Hi Guys
>
> I did post the same thing on seqanswers couple of days but dint get a response. May be you guys can educate me on this.
>
> I am trying to calculate RPKM on the tophat data but have come across this issue that I believe could skew my results.
>
> My #input reads to tophat are ~49 million. The number of reads reported by tophat to be mapped are ~55 million. I assume I am getting more reads mapped than the total input due to the "--max-multihits 15" option I had set.  "Instructs TopHat to allow up to this many alignments to the reference for a given read, and suppresses all alignments for reads with more than this many alignments." -> manual
>
> Now for RPKM calculation I am not sure what number should I use for total mapped reads.
>
> 1. Total reads mapped by Tophat including multireads
> 2. Total uniquely mapped reads
>
> If I go with #2 then I think I should also remove all multi reads when I am doing the counting for reads mapping to my genes which could eliminate RPKM count for paralogous genes.
>
>
> What do you think is my best bet in order to get #total_mapped_reads.

It sounds like what you propose is reasonable in either way, and yes,
if you go with #2, I would remove multireads when counting for RPKM.

Also, if you go with #2, you might want to ensure that your K is
calculate from the number of uniquely mappable positions in your gene
model, just so you keep same w/ same.

Why don't you try calculating RPKM using both 1 and 2, then plot the
expression of gene x from #1 vs. its expression from #2. I suspect the
plot you get will be pretty close to the diagonal, but you never know
unless you try.

Let us know :-)

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioc-sig-sequencing mailing list