[BioC] conversion of geneset species ID
Iain Gallagher
iaingallagher at btopenworld.com
Mon Sep 5 16:57:05 CEST 2011
Dear List
I wonder if someone could help me re-annotate the Broad c2 genesets from human to bovine IDs. Here's what I have so far:
rm(list=ls())
library(biomaRt)
library(GSEABase)
setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')
cowGenes <- read.table('cowGenesENID.csv', header=F, sep='\t')
cow = useMart("ensembl",dataset="btaurus_gene_ensembl")
orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id",values = cowGenes[,1], mart = cow)
orth2 <- orth[which(orth[,2]!=''), ]#drop those with no human ortho
orth3 <- orth2[-which(duplicated(orth2[,1]) == TRUE),]#get only unique mappings i.e. one cow ID to one human ID
head(orth3)
This gets me a data frame of bovine ENSEMBL gene Ids and the human ortholog (again ENSEMBL id).
broadSets <- getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt', geneIdType = EntrezIdentifier('org.Hs.eg.db'))
broadSetsENS <- mapIdentifiers(broadSets, ENSEMBLIdentifier())
I now have the c2 Broad geneset with gene IDs converted to human ENSEMBL ids. I would like to map the postion of each of the ENSEMBL Ids in my dataframe (orth3) and then substitute in the bovine id and the clean up any NA's.
I am at rather a loss as to how to do this and wondered if someone with more familiarity with the GSEABase would be able to help (or perhaps suggest a different strategy!)?
Thanks
Iain
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GSEABase_1.14.0 graph_1.30.0 annotate_1.30.0
[4] org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4
[7] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2
[10] biomaRt_2.8.1
loaded via a namespace (and not attached):
[1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 xtable_1.5-6
>
More information about the Bioconductor
mailing list