[BioC] Applying DESeq on RSEM output

Mon Mar 25 12:41:59 CET 2013

Hi Simon,

Thanks again for your help.

The data files I'm using are the GENE LEVEL and UNNORMALIZED and named
"...expression_rsem_gene.txt". Exon level and normalized versions of the
RSEM output also exist but I do not use them.
These files contain a 'raw_count' column which I thought should be okay as
input for DESeq and EdgeR.

Do you still think it's calculated on the exon level ?

Many thanks,
Dvir

-----Original Message-----
From: Simon Anders [mailto:anders at embl.de] 
Sent: Thursday, March 21, 2013 3:20 PM
To: bioconductor at r-project.org
Subject: Re: [BioC] Applying DESeq on RSEM output

Hi Dvir

On 20/03/13 14:15, dvir.tau at gmail.com wrote:
> I'm running DESeq and EdgeR on RNA-Seq data that was already processed 
> with RSEM (downloaded from TCGA web site).
>
> Since these methods require the raw read counts I'm using the 
> raw_count column of the RSEM output but I'm not sure this is the right 
> thing to do (is it the actual raw count required ?)

The real issue is not that your counts are not integer, but that RSEM gives
you counts per isoform rather than per gene. Now, if you have two very
similar isoforms, RSEM will be unable to decide which isoform to assign a
read to and just spread them proportionally over both. Hence, even if only
one of the two isoforms is differentially expressed, you will incorrectly
see differential expression for both isoforms.

This is why the output of isoform quantification methods such as RSEM of
MMSeq are not suitable as input for differential expression tests.

At the very minimum, you need also the information about the uncertainty of
the assignments of reads to isoforms. In fact, RSEM provides this
information if you run it in its Bayesian mode, but this seems to be hardly
ever done in practice.

If you really need to perform differential expression analysis on a level
finer than whole gene expression, you should either use a tool for
differential exon usage testing, such as our DEXSeq package, or one that
combines isoform abundance estimation and testing for differences in a
unified framework, such as BitSeq. In both cases, you will need the SAM
files.

If you are fine with staying on the gene level for your analysis, you need
to get counts per gene, not per isoform. I am not familiar enough with RSEM,
though, to tell you whether adding up the counts from all the isoforms per
gene would be a good idea.

   Simon

-----
No virus found in this message.
Checked by AVG - www.avg.com