[BioC] DESeq versus CuffDiff2 for RNA-seq expression quantification in parasite-infected blood
Simon Anders
anders at embl.de
Fri Apr 26 19:40:18 CEST 2013
Hi Kevin
On 26/04/13 19:32, Kevin Lee wrote:
> I appreciate your assistance. A follow-up question: what is the
> appropriate method to handle a read that splits an exon junction and is
> therefore mapped to two exons when using a short read mapping software?
> Counting it as being present in both exons seems to give undue weight
> to the read when using DESeq; conversely, it seems important to "double
> count" it when using DEXSeq. Any advice? And any software to readily
> generate these kinds of files, the matrix files required for DE(X)Seq?
> I have just been using an overlapper script that I wrote using the bam
> files and ucsc gene annotations.
I use Python scripts for counting. For DESeq, you can use the
htseq-count script (available from
http://www-huber.embl.de/users/anders/HTSeq/ ), and for DEXSeq, use the
dexseq-count.py script that comes with the DEXSeq Bioconductor package.
The reason that we offer two scripts, and suggest to produce sepearte
count tables for DESeq and DEXSeq, is precisely because of the issue
with reads mapping to two exons that you point out.
While this works well, I do admit that this state of thing is not
terribly elegant.
BTW: If you use our scripts with UCSC annotation, make sure to fix the
gene IDs. (The UCSC table browser puts transcript IDs where it should
put gene IDs; you need to remove the ".nn" suffixes. You will see what I
mean once you have a look at the GFF files.)
Simon
More information about the Bioconductor
mailing list