[BioC] Help with getBM using EST ids from zebrafish
Wolfgang Huber
huber at ebi.ac.uk
Sat Dec 6 15:01:29 CET 2008
Dear Scott,
I would be surprised if the Ensembl BioMart provided this information -
due to the nature and size of EST-to-gene mapping "data". But as always
of course I would be happy to be surprised.
You can download the complete Unigene clusters from
ftp://ftp.ncbi.nih.gov/repository/UniGene/Danio_rerio/
and it is easy with R (or indeed Perl, Python etc.) to parse e.g. the
file Dr.data for your EST IDs and extract correspond locuslink IDs. Also
"CO349769" and "g49431086" are both found and map to the fth1 gene.
Best wishes
Wolfgang
----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
Ochsner, Scott A ha scritto:
> Dear list,
>
> I am trying to retrieve zfin annotation for each of my 6000+ zebrafish EST ids using biomaRt. As an example of the EST ids I've included the output from NCBI's EST search for CO349769.
>
> 1. DR_AOV_FL01_G08 adult ovary full-length (TLL) Danio rerio cDNA, mRNA sequence
> gi|49431086|gb|CO349769.1|[49431086]
>
>
> As a further note, searching UniGene is successful with "CO349769", but not with "49431086". I've set up the following:
>
>> library(biomaRt)
>> ensembl=useMart("ensembl",dataset="drerio_gene_ensembl"))
>
> Question1: If ESTs are mapped (based on previous posts they don't look like they are), what is the appropriate filter variable in the line of code below?
>
>> map<-getBM(attributes=c("zfin_id","zfin_symbol"),filters="?",values="CO349769",mart=ensembl)
>
> Question2: If it turns out ESTs are not mapped, I'll probably have to go through UniGene to obtain an ID I can use with biomaRt. Does anyone know of a way to batch search UniGene? 6000+ ids is a lot to search one by one.
>
> Thanks for any help,
>
>> sessionInfo()
> R version 2.8.0 (2008-10-20)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_1.16.0 genefilter_1.22.0 survival_2.34-1 geneplotter_1.20.0 annotate_1.20.0 xtable_1.5-4 AnnotationDbi_1.4.0 lattice_0.17-15
> [9] limma_2.16.2 affy_1.20.0 Biobase_2.2.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.10.0 DBI_0.2-4 grid_2.8.0 KernSmooth_2.22-22 preprocessCore_1.4.0 RColorBrewer_1.0-2 RCurl_0.91-0
> [8] RSQLite_0.7-1 XML_1.94-0.1
>
> Scott A. Ochsner, Ph.D.
> NURSA Bioinformatics
> Molecular and Cellular Biology
> Baylor College of Medicine
> Houston, TX. 77030
> phone: 713-798-6227
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list