[BioC] conversion of geneset species ID
Martin Morgan
mtmorgan at fhcrc.org
Tue Sep 6 15:03:18 CEST 2011
On 09/06/2011 03:18 AM, Iain Gallagher wrote:
> Dear Martin
>
> Thanks for your suggestion. I cannot get your function to work for me:
>
>> lst = lapply(broadSetsENS, function(gs, map) {
> + huids = geneIds(gs)
> + ## map, not sure what the columns are?
> + geneIds(gs) = map[map$humids %in% humids, "cowids"]
> + geneIdType(gs), ENSEMBLIdentifier()
typo, but for your example below taking the same approach
lst = lapply(gsc, function(gs, map) {
geneIds(gs) = with(map, cowids[humids %in% geneIds(gs)])
gs
}, orth)
cowGsc = GeneSetCollection(lst)
Martin
> Error: unexpected ',' in:
> " geneIds(gs) = map[map$humids %in% huids, "cowids"]
> geneIdType(gs),"
>> gs
> Error: object 'gs' not found
>
> Below is a toy example of what I want to achieve (apologies for not including this before):
>
> #create a repoducible example for the GSEA problem
> library(biomaRt)
> library(GSEABase)
>
> # cow genes!
> cowGenes<- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404')
>
> #set up mart
> cow = useMart("ensembl",dataset="btaurus_gene_ensembl")
>
> # get ortho genes
> orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow)
>
> # drop those with no human ortho
> orth<- orth[which(orth[,2]!=''), ]
> colnames(orth)<- c('cowids', 'humids')
>
> #create a couple of genesets from human genes
> set1<- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1')
> set2<- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2')
> gsc<- GeneSetCollection(set1, set2)
>
> #create a couple of genesets from the same cow genes to illustrate hopeful outcome
> cowSet1<- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1')
> cowSet2<- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2')
> cowGsc<- GeneSetCollection(cowSet1, cowSet2)
>
>
> Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs.
>
>
> I'll continue playing with this but any further guidance would be appreciated.
>
> Best
>
> Iain
>
> --- On Mon, 5/9/11, Martin Morgan<mtmorgan at fhcrc.org> wrote:
>
>> From: Martin Morgan<mtmorgan at fhcrc.org>
>> Subject: Re: [BioC] conversion of geneset species ID
>> To: "Iain Gallagher"<iaingallagher at btopenworld.com>
>> Cc: "bioconductor"<bioconductor at stat.math.ethz.ch>
>> Date: Monday, 5 September, 2011, 22:31
>> Hi Iain --
>>
>> On 09/05/2011 07:57 AM, Iain Gallagher wrote:
>>> Dear List
>>>
>>> I wonder if someone could help me re-annotate the
>> Broad c2 genesets from human to bovine IDs. Here's what I
>> have so far:
>>>
>>> rm(list=ls())
>>> library(biomaRt)
>>> library(GSEABase)
>>>
>>>
>> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')
>>>
>>> cowGenes<- read.table('cowGenesENID.csv', header=F,
>> sep='\t')
>>>
>>> cow =
>> useMart("ensembl",dataset="btaurus_gene_ensembl")
>>> orth =
>> getBM(c("ensembl_gene_id","human_ensembl_gene"),
>> filters="ensembl_gene_id",values = cowGenes[,1], mart =
>> cow)
>>> orth2<- orth[which(orth[,2]!=''), ]#drop those with
>> no human ortho
>>>
>>> orth3<- orth2[-which(duplicated(orth2[,1]) ==
>> TRUE),]#get only unique mappings i.e. one cow ID to one
>> human ID
>>>
>>> head(orth3)
>>>
>>>
>>> This gets me a data frame of bovine ENSEMBL gene Ids
>> and the human ortholog (again ENSEMBL id).
>>>
>>> broadSets<-
>> getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt',
>> geneIdType = EntrezIdentifier('org.Hs.eg.db'))
>>>
>>> broadSetsENS<- mapIdentifiers(broadSets,
>> ENSEMBLIdentifier())
>>>
>>> I now have the c2 Broad geneset with gene IDs
>> converted to human ENSEMBL ids. I would like to map the
>> postion of each of the ENSEMBL Ids in my dataframe (orth3)
>> and then substitute in the bovine id and the clean up any
>> NA's.
>>>
>>> I am at rather a loss as to how to do this and
>> wondered if someone with more familiarity with the GSEABase
>> would be able to help (or perhaps suggest a different
>> strategy!)?
>>
>>
>> Not sure that I follow entirely, but along the lines of
>>
>> lst = lapply(broadSetsENS, function(gs,
>> map) {
>> huids = geneIds(gs)
>> ## map, not sure what the columns
>> are?
>> geneIds(gs) = map[map$huids %in%
>> huids, "cowids"]
>> geneIdType(gs), ENSEMBLIdentifier()
>> gs
>> }, ortho3)
>> GeneSetCollection(lst)
>>
>> This is a bit of a guess, could be more specific if you
>> provided a
>> reproducible example.
>>
>> Hope that helps,
>>
>> Martin
>>
>>> Thanks
>>>
>>> Iain
>>>
>>>> sessionInfo()
>>> R version 2.13.1 (2011-07-08)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_GB.utf8
>> LC_NUMERIC=C
>>> [3] LC_TIME=en_GB.utf8
>> LC_COLLATE=en_GB.utf8
>>> [5] LC_MONETARY=C
>> LC_MESSAGES=en_GB.utf8
>>> [7] LC_PAPER=en_GB.utf8
>> LC_NAME=C
>>> [9] LC_ADDRESS=C
>> LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics
>> grDevices utils datasets
>> methods base
>>>
>>> other attached packages:
>>> [1] GSEABase_1.14.0
>> graph_1.30.0
>> annotate_1.30.0
>>> [4]
>> org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4
>>> [7] DBI_0.2-5
>> AnnotationDbi_1.14.1 Biobase_2.12.2
>>> [10] biomaRt_2.8.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2
>> xtable_1.5-6
>>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list