[BioC] conversion of geneset species ID

Tue Sep 6 12:18:20 CEST 2011

Dear Martin

Thanks for your suggestion. I cannot get your function to work for me:

>    lst = lapply(broadSetsENS, function(gs, map) {
+       huids = geneIds(gs)
+       ## map, not sure what the columns are?
+       geneIds(gs) = map[map$humids %in% humids, "cowids"]
+       geneIdType(gs), ENSEMBLIdentifier()
Error: unexpected ',' in:
"      geneIds(gs) = map[map$humids %in% huids, "cowids"]
      geneIdType(gs),"
>       gs
Error: object 'gs' not found

Below is a toy example of what I want to achieve (apologies for not including this before):

#create a repoducible example for the GSEA problem
library(biomaRt)
library(GSEABase)

# cow genes!
cowGenes <- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404')

#set up mart
cow = useMart("ensembl",dataset="btaurus_gene_ensembl")

# get ortho genes
orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow)

# drop those with no human ortho
orth <- orth[which(orth[,2]!=''), ]
colnames(orth) <- c('cowids', 'humids')

#create a couple of genesets from human genes
set1 <- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1')
set2 <- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2')
gsc <- GeneSetCollection(set1, set2)

#create a couple of genesets from the same cow genes to illustrate hopeful outcome
cowSet1 <- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1')
cowSet2 <- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2')
cowGsc <- GeneSetCollection(cowSet1, cowSet2)

Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs.

I'll continue playing with this but any further guidance would be appreciated.

Best

Iain

--- On Mon, 5/9/11, Martin Morgan <mtmorgan at fhcrc.org> wrote:

> From: Martin Morgan <mtmorgan at fhcrc.org>
> Subject: Re: [BioC] conversion of geneset species ID
> To: "Iain Gallagher" <iaingallagher at btopenworld.com>
> Cc: "bioconductor" <bioconductor at stat.math.ethz.ch>
> Date: Monday, 5 September, 2011, 22:31
> Hi Iain --
> 
> On 09/05/2011 07:57 AM, Iain Gallagher wrote:
> > Dear List
> >
> > I wonder if someone could help me re-annotate the
> Broad c2 genesets from human to bovine IDs. Here's what I
> have so far:
> >
> > rm(list=ls())
> > library(biomaRt)
> > library(GSEABase)
> >
> >
> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')
> >
> > cowGenes<- read.table('cowGenesENID.csv', header=F,
> sep='\t')
> >
> > cow =
> useMart("ensembl",dataset="btaurus_gene_ensembl")
> > orth =
> getBM(c("ensembl_gene_id","human_ensembl_gene"),
> filters="ensembl_gene_id",values = cowGenes[,1], mart =
> cow)
> > orth2<- orth[which(orth[,2]!=''), ]#drop those with
> no human ortho
> >
> > orth3<- orth2[-which(duplicated(orth2[,1]) ==
> TRUE),]#get only unique mappings i.e. one cow ID to one
> human ID
> >
> > head(orth3)
> >
> >
> > This gets me a data frame of bovine ENSEMBL gene Ids
> and the human ortholog (again ENSEMBL id).
> >
> > broadSets<-
> getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt',
> geneIdType = EntrezIdentifier('org.Hs.eg.db'))
> >
> > broadSetsENS<- mapIdentifiers(broadSets,
> ENSEMBLIdentifier())
> >
> > I now have the c2 Broad geneset with gene IDs
> converted to human ENSEMBL ids. I would like to map the
> postion of each of the ENSEMBL Ids in my dataframe (orth3)
> and then substitute in the bovine id and the clean up any
> NA's.
> >
> > I am at rather a loss as to how to do this and
> wondered if someone with more familiarity with the GSEABase
> would be able to help (or perhaps suggest a different
> strategy!)?
> 
> 
> Not sure that I follow entirely, but along the lines of
> 
>    lst = lapply(broadSetsENS, function(gs,
> map) {
>       huids = geneIds(gs)
>       ## map, not sure what the columns
> are?
>       geneIds(gs) = map[map$huids %in%
> huids, "cowids"]
>       geneIdType(gs), ENSEMBLIdentifier()
>       gs
>    }, ortho3)
>    GeneSetCollection(lst)
> 
> This is a bit of a guess, could be more specific if you
> provided a 
> reproducible example.
> 
> Hope that helps,
> 
> Martin
> 
> > Thanks
> >
> > Iain
> >
> >> sessionInfo()
> > R version 2.13.1 (2011-07-08)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> >   [1] LC_CTYPE=en_GB.utf8   
>    LC_NUMERIC=C
> >   [3] LC_TIME=en_GB.utf8   
>     LC_COLLATE=en_GB.utf8
> >   [5] LC_MONETARY=C     
>        LC_MESSAGES=en_GB.utf8
> >   [7] LC_PAPER=en_GB.utf8   
>    LC_NAME=C
> >   [9] LC_ADDRESS=C     
>         LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics 
> grDevices utils     datasets 
> methods   base
> >
> > other attached packages:
> >   [1] GSEABase_1.14.0   
>   graph_1.30.0     
>    annotate_1.30.0
> >   [4]
> org.Hs.eg.db_2.5.0   org.Bt.eg.db_2.5.0   RSQLite_0.9-4
> >   [7] DBI_0.2-5     
>       AnnotationDbi_1.14.1 Biobase_2.12.2
> > [10] biomaRt_2.8.1
> >
> > loaded via a namespace (and not attached):
> > [1] RCurl_1.6-9  tools_2.13.1 XML_3.4-2 
>   xtable_1.5-6
> >>
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> -- 
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> 
> Location: M1-B861
> Telephone: 206 667-2793
>