[BioC] R: is there an identifier that uniquely identifies a gene all over the many databases ?
Steve Lianoglou
mailinglist.honeypot at gmail.com
Mon Jul 13 01:52:30 CEST 2009
Hi,
> My goal is to get the 3UTR sequence associated to experimentally
> validated genes.
> Through entering "Human" species and miRNA identifier "hsa-miR-yyy"
> TarBase interface returns a
> list of all gene ENSGxxxxxx that have been experimentally tested.
> I input such ENSGxxxxxx identifier to getSequence (BioMat function)
> to get the 3UTRr sequence.
> I was surprised to find multiple 3UTR sequences associated to the
> same ENSGxxxxxx.
> Maybe each transcript is identified by a unique ENSTxxxx
> identifier... TRUE/FALSE ?
That's likely the case, but you can easily verify this yourself.
Just add "ensembl_transcript_id" (in addition to the ensembl_gene_id
you already have) as one of the attributes you'd like returned in your
getBM query to see if that explains the multiple 3_utr_start/end
results you get.
-steve
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos
More information about the Bioconductor
mailing list