[BioC] Help with getBM using EST ids from zebrafish

Wolfgang Huber huber at ebi.ac.uk
Sat Dec 6 15:01:29 CET 2008


Dear Scott,

I would be surprised if the Ensembl BioMart provided this information - 
due to the nature and size of EST-to-gene mapping "data". But as always 
of course I would be happy to be surprised.

You can download the complete Unigene clusters from 
ftp://ftp.ncbi.nih.gov/repository/UniGene/Danio_rerio/

and it is easy with R (or indeed Perl, Python etc.) to parse e.g. the 
file Dr.data for your EST IDs and extract correspond locuslink IDs. Also 
"CO349769" and "g49431086" are both found and map to the fth1 gene.

Best wishes
      Wolfgang

----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber

Ochsner, Scott A ha scritto:
> Dear list,
> 
> I am trying to retrieve zfin annotation for each of my 6000+ zebrafish EST ids using biomaRt.  As an example of the EST ids I've included the output from NCBI's EST search for CO349769.
> 
> 1. DR_AOV_FL01_G08 adult ovary full-length (TLL) Danio rerio cDNA, mRNA sequence
> gi|49431086|gb|CO349769.1|[49431086] 
> 
> 
> As a further note, searching UniGene is successful with "CO349769", but not with "49431086".  I've set up the following:
> 
>> library(biomaRt)
>> ensembl=useMart("ensembl",dataset="drerio_gene_ensembl"))
> 
> Question1:  If ESTs are mapped (based on previous posts they don't look like they are), what is the appropriate filter variable in the line of code below?
> 
>> map<-getBM(attributes=c("zfin_id","zfin_symbol"),filters="?",values="CO349769",mart=ensembl)
> 
> Question2:  If it turns out ESTs are not mapped, I'll probably have to go through UniGene to obtain an ID I can use with biomaRt.  Does anyone know of a way to batch search UniGene?  6000+ ids is a lot to search one by one.
> 
> Thanks for any help,
> 
>> sessionInfo()
> R version 2.8.0 (2008-10-20) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
>  [1] biomaRt_1.16.0      genefilter_1.22.0   survival_2.34-1     geneplotter_1.20.0  annotate_1.20.0     xtable_1.5-4        AnnotationDbi_1.4.0 lattice_0.17-15    
>  [9] limma_2.16.2        affy_1.20.0         Biobase_2.2.0      
> 
> loaded via a namespace (and not attached):
> [1] affyio_1.10.0        DBI_0.2-4            grid_2.8.0           KernSmooth_2.22-22   preprocessCore_1.4.0 RColorBrewer_1.0-2   RCurl_0.91-0        
> [8] RSQLite_0.7-1        XML_1.94-0.1  
> 
> Scott A. Ochsner, Ph.D.
> NURSA Bioinformatics
> Molecular and Cellular Biology
> Baylor College of Medicine
> Houston, TX. 77030
> phone: 713-798-6227 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list