[BioC] Convert gene symbols to ensembl id

Martin Morgan mtmorgan at fhcrc.org
Sun May 6 17:30:16 CEST 2012


On 05/06/2012 07:23 AM, Fred Boehm wrote:
> Greetings, Michelle,
>
> I haven't worked with mouse data, but I think that the function getBM()
> in the bioconductor package biomaRt can help.

also

   library(mouse4302.db)
   unlist(mget(fit$genes$ID, mouse4302ENSEMBL, ifnotfound=NA)

or even better in R 2.14.0 or greater

   select(mouse4302.db, ids, "ENSEMBL")

(see ?select, ?keys, ?cols, ?keytype)

Martin

>
> For instance, one could use the code below (replacing mySymbols with the
> vector of symbols that interest you) to output a data.frame with both
> ensembl gene ID and mgi symbol.
>
> The creators of biomaRt have generated some nice tutorial materials and
> posted them at:
>
> http://www.bioconductor.org/packages/2.2/bioc/html/biomaRt.html
>
> If I've misinterpreted your question, you may be able to find the answer
> by viewing the biomaRt materials.
>
> I hope that this helps.
>
> Cheers,
> Fred
> --------------
>
> source("http://bioconductor.org/biocLite.R")
> biocLite("biomaRt")
>
> library(biomaRt)
> mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
> listFilters(mouse)
> listAttributes(mouse)
> mySymbols<- "2310015A10Rik" # mySymbols is a vector of MGI symbols.
> getBM( attributes=c("ensembl_gene_id", "mgi_symbol") , filters=
> "mgi_symbol"    , values =mySymbols ,mart=mouse)
>
>
>
>
> On 5/6/12 5:41 AM, michelle_low wrote:
>> Hi all,
>>
>> I have a list of gene symbols generated from the differential expression analysis below. How do I convert these symbols to emsembl id? Thanks
>>
>>
>> Regards,
>> Michelle
>>
>>
>>
>>
>> R version 2.14.1 (2011-12-22)
>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>
>>
>> > library(affy)
>> >  library(limma)
>> > pd=read.AnnotatedDataFrame("phenodata.txt",header=TRUE,sep="",row.names=1)
>> >  a=ReadAffy(filenames=rownames(pData(pd)),phenoData=pd,verbose=TRUE)
>> 1 reading Control-1.cel ...instantiating an AffyBatch (intensity a 1004004x6 matrix)...done.
>> Reading in : Control-1.cel
>> Reading in : Control-2.cel
>> Reading in : Dicer-1.cel
>> Reading in : Dicer-2.cel
>> Reading in : Drosha-1.cel
>> Reading in : Drosha-2.cel
>> > x=rma(a)
>> Loading required package: AnnotationDbi
>> Background correcting
>> Normalizing
>> Calculating Expression
>> Warning message:
>> package âEUR~AnnotationDbiâEUR^(TM) was built under R version 2.14.2
>> > c=paste(pd$treatment,pd$n,sep="")
>> >  f=factor(c)
>> >  design=model.matrix(~0+f)
>> > colnames(design)=levels(f)
>> > fit=lmFit(x,design)
>> > library(mouse4302.db)
>> Loading required package: org.Mm.eg.db
>> Loading required package: DBI
>>
>> Warning messages:
>> 1: package âEUR~RSQLiteâEUR^(TM) was built under R version 2.14.2
>> 2: package âEUR~DBIâEUR^(TM) was built under R version 2.14.2
>> > library(annotate)
>> Warning message:
>> package âEUR~annotateâEUR^(TM) was built under R version 2.14.2
>> >  fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"mouse4302.db")
>> > contrast.matrix=makeContrasts(E1="present-absent.Dicer",E2="present-absent.Drosha",E3="absent.Drosha-absent.Dicer",levels=design)
>> >  fit2=contrasts.fit(fit,contrast.matrix)
>> >
>> > fit2=eBayes(fit2)
>> >
>> > results1<-topTable (fit2, coef=1, p.value=0.0001,number=nrow(fit2))
>> > write.table(results1, file="control-Dicer5.txt")
>> > results2<-topTable (fit2, coef=2, p.value=0.0001,number=nrow(fit2))
>> > write.table(results2, file="control-Drosha5.txt")
>> > results3<-topTable (fit2, coef=3, p.value=0.0001,number=nrow(fit2))
>> > results=decideTests(fit2)
>> > summary(results2)
>> > b=venncounts(results2)
>> > print(b)
>> > vennDiagram(results)
>> > a=vennDiagram(results,include=c("up","down"),counts.col=c("red","green"))
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list