[BioC] conversion of geneset species ID
Iain Gallagher
iaingallagher at btopenworld.com
Tue Sep 6 12:18:20 CEST 2011
Dear Martin
Thanks for your suggestion. I cannot get your function to work for me:
> lst = lapply(broadSetsENS, function(gs, map) {
+ huids = geneIds(gs)
+ ## map, not sure what the columns are?
+ geneIds(gs) = map[map$humids %in% humids, "cowids"]
+ geneIdType(gs), ENSEMBLIdentifier()
Error: unexpected ',' in:
" geneIds(gs) = map[map$humids %in% huids, "cowids"]
geneIdType(gs),"
> gs
Error: object 'gs' not found
Below is a toy example of what I want to achieve (apologies for not including this before):
#create a repoducible example for the GSEA problem
library(biomaRt)
library(GSEABase)
# cow genes!
cowGenes <- c('ENSBTAG00000003825', 'ENSBTAG00000015185', 'ENSBTAG00000001068', 'ENSBTAG00000017500', 'ENSBTAG00000012288', 'ENSBTAG00000031901', 'ENSBTAG00000006103', 'ENSBTAG00000003882', 'ENSBTAG00000026829', 'ENSBTAG00000037404')
#set up mart
cow = useMart("ensembl",dataset="btaurus_gene_ensembl")
# get ortho genes
orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id", values = cowGenes, mart = cow)
# drop those with no human ortho
orth <- orth[which(orth[,2]!=''), ]
colnames(orth) <- c('cowids', 'humids')
#create a couple of genesets from human genes
set1 <- GeneSet(orth$humids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'set1')
set2 <- GeneSet(orth$humids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'set2')
gsc <- GeneSetCollection(set1, set2)
#create a couple of genesets from the same cow genes to illustrate hopeful outcome
cowSet1 <- GeneSet(orth$cowids[1:5], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset1')
cowSet2 <- GeneSet(orth$cowids[3:9], geneIdType = ENSEMBLIdentifier(), setName = 'Cowset2')
cowGsc <- GeneSetCollection(cowSet1, cowSet2)
Basically I'd like to go from gsc to cowGsc using the dataframe mapping orthologs.
I'll continue playing with this but any further guidance would be appreciated.
Best
Iain
--- On Mon, 5/9/11, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> From: Martin Morgan <mtmorgan at fhcrc.org>
> Subject: Re: [BioC] conversion of geneset species ID
> To: "Iain Gallagher" <iaingallagher at btopenworld.com>
> Cc: "bioconductor" <bioconductor at stat.math.ethz.ch>
> Date: Monday, 5 September, 2011, 22:31
> Hi Iain --
>
> On 09/05/2011 07:57 AM, Iain Gallagher wrote:
> > Dear List
> >
> > I wonder if someone could help me re-annotate the
> Broad c2 genesets from human to bovine IDs. Here's what I
> have so far:
> >
> > rm(list=ls())
> > library(biomaRt)
> > library(GSEABase)
> >
> >
> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')
> >
> > cowGenes<- read.table('cowGenesENID.csv', header=F,
> sep='\t')
> >
> > cow =
> useMart("ensembl",dataset="btaurus_gene_ensembl")
> > orth =
> getBM(c("ensembl_gene_id","human_ensembl_gene"),
> filters="ensembl_gene_id",values = cowGenes[,1], mart =
> cow)
> > orth2<- orth[which(orth[,2]!=''), ]#drop those with
> no human ortho
> >
> > orth3<- orth2[-which(duplicated(orth2[,1]) ==
> TRUE),]#get only unique mappings i.e. one cow ID to one
> human ID
> >
> > head(orth3)
> >
> >
> > This gets me a data frame of bovine ENSEMBL gene Ids
> and the human ortholog (again ENSEMBL id).
> >
> > broadSets<-
> getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt',
> geneIdType = EntrezIdentifier('org.Hs.eg.db'))
> >
> > broadSetsENS<- mapIdentifiers(broadSets,
> ENSEMBLIdentifier())
> >
> > I now have the c2 Broad geneset with gene IDs
> converted to human ENSEMBL ids. I would like to map the
> postion of each of the ENSEMBL Ids in my dataframe (orth3)
> and then substitute in the bovine id and the clean up any
> NA's.
> >
> > I am at rather a loss as to how to do this and
> wondered if someone with more familiarity with the GSEABase
> would be able to help (or perhaps suggest a different
> strategy!)?
>
>
> Not sure that I follow entirely, but along the lines of
>
> lst = lapply(broadSetsENS, function(gs,
> map) {
> huids = geneIds(gs)
> ## map, not sure what the columns
> are?
> geneIds(gs) = map[map$huids %in%
> huids, "cowids"]
> geneIdType(gs), ENSEMBLIdentifier()
> gs
> }, ortho3)
> GeneSetCollection(lst)
>
> This is a bit of a guess, could be more specific if you
> provided a
> reproducible example.
>
> Hope that helps,
>
> Martin
>
> > Thanks
> >
> > Iain
> >
> >> sessionInfo()
> > R version 2.13.1 (2011-07-08)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> > [1] LC_CTYPE=en_GB.utf8
> LC_NUMERIC=C
> > [3] LC_TIME=en_GB.utf8
> LC_COLLATE=en_GB.utf8
> > [5] LC_MONETARY=C
> LC_MESSAGES=en_GB.utf8
> > [7] LC_PAPER=en_GB.utf8
> LC_NAME=C
> > [9] LC_ADDRESS=C
> LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics
> grDevices utils datasets
> methods base
> >
> > other attached packages:
> > [1] GSEABase_1.14.0
> graph_1.30.0
> annotate_1.30.0
> > [4]
> org.Hs.eg.db_2.5.0 org.Bt.eg.db_2.5.0 RSQLite_0.9-4
> > [7] DBI_0.2-5
> AnnotationDbi_1.14.1 Biobase_2.12.2
> > [10] biomaRt_2.8.1
> >
> > loaded via a namespace (and not attached):
> > [1] RCurl_1.6-9 tools_2.13.1 XML_3.4-2
> xtable_1.5-6
> >>
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
More information about the Bioconductor
mailing list