[BioC] conversion of geneset species ID
Martin Morgan
mtmorgan at fhcrc.org
Mon Sep 5 23:31:37 CEST 2011
Hi Iain --
On 09/05/2011 07:57 AM, Iain Gallagher wrote:
> Dear List
>
> I wonder if someone could help me re-annotate the Broad c2 genesets from human to bovine IDs. Here's what I have so far:
>
> rm(list=ls())
> library(biomaRt)
> library(GSEABase)
>
> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')
>
> cowGenes<- read.table('cowGenesENID.csv', header=F, sep='\t')
>
> cow = useMart("ensembl",dataset="btaurus_gene_ensembl")
> orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id",values = cowGenes[,1], mart = cow)
> orth2<- orth[which(orth[,2]!=''), ]#drop those with no human ortho
>
> orth3<- orth2[-which(duplicated(orth2[,1]) == TRUE),]#get only unique mappings i.e. one cow ID to one human ID
>
> head(orth3)
>
>
> This gets me a data frame of bovine ENSEMBL gene Ids and the human ortholog (again ENSEMBL id).
>
> broadSets<- getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt', geneIdType = EntrezIdentifier('org.Hs.eg.db'))
>
> broadSetsENS<- mapIdentifiers(broadSets, ENSEMBLIdentifier())
>
> I now have the c2 Broad geneset with gene IDs converted to human ENSEMBL ids. I would like to map the postion of each of the ENSEMBL Ids in my dataframe (orth3) and then substitute in the bovine id and the clean up any NA's.
>
> I am at rather a loss as to how to do this and wondered if someone with more familiarity with the GSEABase would be able to help (or perhaps suggest a different strategy!)?
Not sure that I follow entirely, but along the lines of
lst = lapply(broadSetsENS, function(gs, map) {
huids = geneIds(gs)
## map, not sure what the columns are?
geneIds(gs) = map[map$huids %in% huids, "cowids"]
geneIdType(gs), ENSEMBLIdentifier()
gs
}, ortho3)
GeneSetCollection(lst)
This is a bit of a guess, could be more specific if you provided a
reproducible example.
Hope that helps,
Martin
> Thanks
>
> Iain
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
> [7] LC_PAPER=en_GB.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GSEABase_1.14.0 graph_1.30.0 annotate_1.30.0
> [4] org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4
> [7] DBI_0.2-5 AnnotationDbi_1.14.1 Biobase_2.12.2
> [10] biomaRt_2.8.1
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2 xtable_1.5-6
>>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list