[BioC] is database RefSeq achievable from any Bioconductor package
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Tue Jun 1 16:56:45 CEST 2010
On Tue, Jun 1, 2010 at 10:37 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Mon, May 31, 2010 at 8:40 AM, <mauede at alice.it> wrote:
>> The Biologist we work with has brought my attention to some misalignment between
>> Ensembl and RefSeq with regard to the length and position of 3UTR sequences.
>
> I'd like to comment on this, but I'm not sure I'd provide any useful
> information w/o more details from you.
>
> But just one point: the RefSeq gene annotations and the ensembl gene
> annotations are not necessarily the same, so what you say here isn't
> all that surprising.
>
> A quick example: the number (and "character") of isoforms per "gene"
> often differ between the two sources.
An additional comment: the definition of UTR and coding region
requires that you know what part of the transcript is actually
translated. This is well known for the canonical transcript of most
genes in well-annotated organisms. But it is much less well known for
alternative transcripts from the same gene, even for a well-annotated
organism such as drosophila (this is based on the not-newest version
of Flybase). Note that this (=defining coding region and UTRs) is
actually surprisingly hard to do computationally (it involves a lot of
guess work). For more detail on this, for drosophila, you can read
parts of
Hansen KD, Lareau LF, Blanchette M, Green RE, Meng Q, et al. 2009
Genome-Wide Identification of Alternative Splice Forms Down-Regulated
by Nonsense-Mediated mRNA Decay in Drosophila. PLoS Genet 5(6):
e1000525. doi:10.1371/journal.pgen.1000525
http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000525
Especially the "Reannotating coding regions reveals distinct features
of NMD–target isoforms" subsection of the results.
This proved to essential for the this particular paper. Fixing up the
mistakes in Flybase made our results interpretable instead of just
looking like noise.
Kasper
More information about the Bioconductor
mailing list