[BioC] conversion of geneset species ID

Martin Morgan mtmorgan at fhcrc.org
Tue Sep 6 15:03:18 CEST 2011


On 09/06/2011 03:18 AM, Iain Gallagher wrote:
> Dear Martin
>
> Thanks for your suggestion. I cannot get your function to work for me:
>
>>     lst = lapply(broadSetsENS, function(gs, map) {
> +       huids = geneIds(gs)
> +       ## map, not sure what the columns are?
> +       geneIds(gs) = map[map$humids %in% humids, "cowids"]
> +       geneIdType(gs), ENSEMBLIdentifier()

typo, but for your example below taking the same approach

   lst = lapply(gsc, function(gs, map) {
      geneIds(gs) = with(map, cowids[humids %in% geneIds(gs)])
      gs
   }, orth)
   cowGsc = GeneSetCollection(lst)

Martin

> Error: unexpected ',' in:
> "      geneIds(gs) = map[map$humids %in% huids, "cowids"]
>        geneIdType(gs),"
>>        gs
> Error: object 'gs' not found
>
> Below is a toy example of what I want to achieve (apologies for not including this before):
>
> #create a repoducible example for the GSEA problem
> library(biomaRt)
> library(GSEABase)
>
> # cow genes!
> cowGenes<- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404')
>
> #set up mart
> cow = useMart("ensembl",dataset="btaurus_gene_ensembl")
>
> # get ortho genes
> orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow)
>
> # drop those with no human ortho
> orth<- orth[which(orth[,2]!=''), ]
> colnames(orth)<- c('cowids', 'humids')
>
> #create a couple of genesets from human genes
> set1<- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1')
> set2<- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2')
> gsc<- GeneSetCollection(set1, set2)
>
> #create a couple of genesets from the same cow genes to illustrate hopeful outcome
> cowSet1<- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1')
> cowSet2<- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2')
> cowGsc<- GeneSetCollection(cowSet1, cowSet2)
>
>
> Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs.
>
>
> I'll continue playing with this but any further guidance would be appreciated.
>
> Best
>
> Iain
>
> --- On Mon, 5/9/11, Martin Morgan<mtmorgan at fhcrc.org>  wrote:
>
>> From: Martin Morgan<mtmorgan at fhcrc.org>
>> Subject: Re: [BioC] conversion of geneset species ID
>> To: "Iain Gallagher"<iaingallagher at btopenworld.com>
>> Cc: "bioconductor"<bioconductor at stat.math.ethz.ch>
>> Date: Monday, 5 September, 2011, 22:31
>> Hi Iain --
>>
>> On 09/05/2011 07:57 AM, Iain Gallagher wrote:
>>> Dear List
>>>
>>> I wonder if someone could help me re-annotate the
>> Broad c2 genesets from human to bovine IDs. Here's what I
>> have so far:
>>>
>>> rm(list=ls())
>>> library(biomaRt)
>>> library(GSEABase)
>>>
>>>
>> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')
>>>
>>> cowGenes<- read.table('cowGenesENID.csv', header=F,
>> sep='\t')
>>>
>>> cow =
>> useMart("ensembl",dataset="btaurus_gene_ensembl")
>>> orth =
>> getBM(c("ensembl_gene_id","human_ensembl_gene"),
>> filters="ensembl_gene_id",values = cowGenes[,1], mart =
>> cow)
>>> orth2<- orth[which(orth[,2]!=''), ]#drop those with
>> no human ortho
>>>
>>> orth3<- orth2[-which(duplicated(orth2[,1]) ==
>> TRUE),]#get only unique mappings i.e. one cow ID to one
>> human ID
>>>
>>> head(orth3)
>>>
>>>
>>> This gets me a data frame of bovine ENSEMBL gene Ids
>> and the human ortholog (again ENSEMBL id).
>>>
>>> broadSets<-
>> getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt',
>> geneIdType = EntrezIdentifier('org.Hs.eg.db'))
>>>
>>> broadSetsENS<- mapIdentifiers(broadSets,
>> ENSEMBLIdentifier())
>>>
>>> I now have the c2 Broad geneset with gene IDs
>> converted to human ENSEMBL ids. I would like to map the
>> postion of each of the ENSEMBL Ids in my dataframe (orth3)
>> and then substitute in the bovine id and the clean up any
>> NA's.
>>>
>>> I am at rather a loss as to how to do this and
>> wondered if someone with more familiarity with the GSEABase
>> would be able to help (or perhaps suggest a different
>> strategy!)?
>>
>>
>> Not sure that I follow entirely, but along the lines of
>>
>>     lst = lapply(broadSetsENS, function(gs,
>> map) {
>>        huids = geneIds(gs)
>>        ## map, not sure what the columns
>> are?
>>        geneIds(gs) = map[map$huids %in%
>> huids, "cowids"]
>>        geneIdType(gs), ENSEMBLIdentifier()
>>        gs
>>     }, ortho3)
>>     GeneSetCollection(lst)
>>
>> This is a bit of a guess, could be more specific if you
>> provided a
>> reproducible example.
>>
>> Hope that helps,
>>
>> Martin
>>
>>> Thanks
>>>
>>> Iain
>>>
>>>> sessionInfo()
>>> R version 2.13.1 (2011-07-08)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>>     [1] LC_CTYPE=en_GB.utf8
>>     LC_NUMERIC=C
>>>     [3] LC_TIME=en_GB.utf8
>>      LC_COLLATE=en_GB.utf8
>>>     [5] LC_MONETARY=C
>>         LC_MESSAGES=en_GB.utf8
>>>     [7] LC_PAPER=en_GB.utf8
>>     LC_NAME=C
>>>     [9] LC_ADDRESS=C
>>          LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics
>> grDevices utils     datasets
>> methods   base
>>>
>>> other attached packages:
>>>     [1] GSEABase_1.14.0
>>    graph_1.30.0
>>     annotate_1.30.0
>>>     [4]
>> org.Hs.eg.db_2.5.0   org.Bt.eg.db_2.5.0   RSQLite_0.9-4
>>>     [7] DBI_0.2-5
>>        AnnotationDbi_1.14.1 Biobase_2.12.2
>>> [10] biomaRt_2.8.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] RCurl_1.6-9  tools_2.13.1 XML_3.4-2
>>    xtable_1.5-6
>>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list