[BioC] is database RefSeq achievable from any Bioconductor package
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Jun 1 16:37:19 CEST 2010
Hi,
On Mon, May 31, 2010 at 8:40 AM, <mauede at alice.it> wrote:
> The Biologist we work with has brought my attention to some misalignment between
> Ensembl and RefSeq with regard to the length and position of 3UTR sequences.
I'd like to comment on this, but I'm not sure I'd provide any useful
information w/o more details from you.
But just one point: the RefSeq gene annotations and the ensembl gene
annotations are not necessarily the same, so what you say here isn't
all that surprising.
A quick example: the number (and "character") of isoforms per "gene"
often differ between the two sources.
If you really want to turn your world view upside down, check out the
AceView annotations some day ...
Just thought I'd mention ...
> I have been querying Ensembl many times through biomaRt .
> I wonder whether I can reach RefSeq data through biomaRt or any other Biconductor package.
> Unfortunately, I cannot find RefSeq in the list of databases obtained through function listMarts()
You can download the gene annotation tracks from the UCSC table
browser and parse them out to get your 3'UTRs.
I know the GenomicFeatures packages has code to download and parse
these (what used to be called) 'knownGene' tables automagically and
dump them into an SQLite db, so you can:
(i) look at that code to find inspiration
(ii) just let the GF package do it's thing and work w/ the resulting database
(iii) d/l the table manually and parse out the relevant coordinate
info by yourself.
I'm at a loss to offer you (i) any other packages to help you do this
automatically, or (ii) another source to find the info you need from.
Hope that helps,
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list