[BioC] Applying DESeq on RSEM output
anders at embl.de
Thu Mar 21 14:19:43 CET 2013
On 20/03/13 14:15, dvir.tau at gmail.com wrote:
> I'm running DESeq and EdgeR on RNA-Seq data that was already processed with
> RSEM (downloaded from TCGA web site).
> Since these methods require the raw read counts I'm using the raw_count
> column of the RSEM output but I'm not sure this is the right thing to do (is
> it the actual raw count required ?)
The real issue is not that your counts are not integer, but that RSEM
gives you counts per isoform rather than per gene. Now, if you have two
very similar isoforms, RSEM will be unable to decide which isoform to
assign a read to and just spread them proportionally over both. Hence,
even if only one of the two isoforms is differentially expressed, you
will incorrectly see differential expression for both isoforms.
This is why the output of isoform quantification methods such as RSEM of
MMSeq are not suitable as input for differential expression tests.
At the very minimum, you need also the information about the uncertainty
of the assignments of reads to isoforms. In fact, RSEM provides this
information if you run it in its Bayesian mode, but this seems to be
hardly ever done in practice.
If you really need to perform differential expression analysis on a
level finer than whole gene expression, you should either use a tool for
differential exon usage testing, such as our DEXSeq package, or one that
combines isoform abundance estimation and testing for differences in a
unified framework, such as BitSeq. In both cases, you will need the SAM
If you are fine with staying on the gene level for your analysis, you
need to get counts per gene, not per isoform. I am not familiar enough
with RSEM, though, to tell you whether adding up the counts from all the
isoforms per gene would be a good idea.
More information about the Bioconductor