[BioC] DEXSeq - too many exons in gene
Steve Lianoglou
lianoglou.steve at gene.com
Thu Feb 6 18:54:38 CET 2014
Hi,
A few comments in line:
On Thu, Feb 6, 2014 at 9:01 AM, António domingues
<amjdomingues at gmail.com> wrote:
> Hi Bioconductors,
>
> I happened upon a funny thing in DEXseq: a gene which appears to have more
> exons in the final DEXseq output than the annotation suggests. The gene
> ENSMUSG00000027854 (screen-shot from UCSC in attachment) suggests the 3
> exons in a flattened gene model. However, the DEXSeq results lists 13 exons
> (here showing the output of htseq-count):
Not sure why you say the *gene* only has 3 exons ... you have
highlighted one isoform of the gene which has very few exons, but you
can from both your picture and the exons definitions you pasted below
for ENSMUSG00000027854 (presumably that's Csde1 :-) that if you
consider all of the isoforms of the gene together, it has many more
than just three exons.
Know what I mean?
> Between exon1 is only 1 base long (?) and exons1 to 4 are contiguous. As far
> as I am aware, DEXSeq model should have flattened all of these into one
> single "exon". Is this correct? is the error coming from the gtf? (at the
> end of the message there is also the gene annotation in the gtf).
I'm trying to parse the various exon annotations from your email, but
I don't see where the 1-width exon is.
Figure 1 from their paper shows pretty clearly how the "break down" of
exons are calcualted across isoforms to create *counting bins* -- just
keep in mind that these things are not necessarily "exons" anymore.
> This is specially concerning for me because I am interested in selecting the
> first and last exon of genes, using the exon ranking from DEXSeq, to analyze
> further.
I'm not sure if what I posted was at all helpful, but if someone else
doesn't do a better job of providing you with the answer you were
looking for, you might try to draw a figure of a gene model (with a
few splicing isoforms) and point out what it is, exactly, that you
hope to extract from it.
While it's clear what "First and last" exon of a *single transcript
isoform* of a gene might be, it might get hairy when you start
summarizing the "counting bins" across multiple isoforms of the same
gene.
HTH,
-steve
--
Steve Lianoglou
Computational Biologist
Genentech
More information about the Bioconductor
mailing list