[BioC] conversion of geneset species ID

Wed Sep 7 10:54:31 CEST 2011

Thank you Martin.

This is just what I wanted.

Best

Iain

--- On Tue, 6/9/11, Martin Morgan <mtmorgan at fhcrc.org> wrote:

> From: Martin Morgan <mtmorgan at fhcrc.org>
> Subject: Re: [BioC] conversion of geneset species ID
> To: "Iain Gallagher" <iaingallagher at btopenworld.com>
> Cc: "bioconductor" <bioconductor at stat.math.ethz.ch>
> Date: Tuesday, 6 September, 2011, 14:03
> On 09/06/2011 03:18 AM, Iain
> Gallagher wrote:
> > Dear Martin
> >
> > Thanks for your suggestion. I cannot get your function
> to work for me:
> >
> >>     lst = lapply(broadSetsENS,
> function(gs, map) {
> > +       huids = geneIds(gs)
> > +       ## map, not sure what
> the columns are?
> > +       geneIds(gs) =
> map[map$humids %in% humids, "cowids"]
> > +       geneIdType(gs),
> ENSEMBLIdentifier()
> 
> typo, but for your example below taking the same approach
> 
>    lst = lapply(gsc, function(gs, map) {
>       geneIds(gs) = with(map, cowids[humids
> %in% geneIds(gs)])
>       gs
>    }, orth)
>    cowGsc = GeneSetCollection(lst)
> 
> Martin
> 
> > Error: unexpected ',' in:
> > "      geneIds(gs) = map[map$humids
> %in% huids, "cowids"]
> >        geneIdType(gs),"
> >>        gs
> > Error: object 'gs' not found
> >
> > Below is a toy example of what I want to achieve
> (apologies for not including this before):
> >
> > #create a repoducible example for the GSEA problem
> > library(biomaRt)
> > library(GSEABase)
> >
> > # cow genes!
> > cowGenes<- c('ENSBTAG00000003825',
> 'ENSBTAG00000015185', 'ENSBTAG00000001068',
> 'ENSBTAG00000017500', 'ENSBTAG00000012288',
> 'ENSBTAG00000031901', 'ENSBTAG00000006103',
> 'ENSBTAG00000003882', 'ENSBTAG00000026829',
> 'ENSBTAG00000037404')
> >
> > #set up mart
> > cow =
> useMart("ensembl",dataset="btaurus_gene_ensembl")
> >
> > # get ortho genes
> > orth =
> getBM(c("ensembl_gene_id","human_ensembl_gene"),
> filters="ensembl_gene_id", values = cowGenes, mart = cow)
> >
> > # drop those with no human ortho
> > orth<- orth[which(orth[,2]!=''), ]
> > colnames(orth)<- c('cowids', 'humids')
> >
> > #create a couple of genesets from human genes
> > set1<- GeneSet(orth$humids[1:5], geneIdType =
> ENSEMBLIdentifier(), setName = 'set1')
> > set2<- GeneSet(orth$humids[3:9], geneIdType =
> ENSEMBLIdentifier(), setName = 'set2')
> > gsc<- GeneSetCollection(set1, set2)
> >
> > #create a couple of genesets from the same cow genes
> to illustrate hopeful outcome
> > cowSet1<- GeneSet(orth$cowids[1:5], geneIdType =
> ENSEMBLIdentifier(), setName = 'Cowset1')
> > cowSet2<- GeneSet(orth$cowids[3:9], geneIdType =
> ENSEMBLIdentifier(), setName = 'Cowset2')
> > cowGsc<- GeneSetCollection(cowSet1, cowSet2)
> >
> >
> > Basically I'd like to go from gsc to cowGsc using the
> dataframe mapping orthologs.
> >
> >
> > I'll continue playing with this but any further
> guidance would be appreciated.
> >
> > Best
> >
> > Iain
> >
> > --- On Mon, 5/9/11, Martin Morgan<mtmorgan at fhcrc.org> 
> wrote:
> >
> >> From: Martin Morgan<mtmorgan at fhcrc.org>
> >> Subject: Re: [BioC] conversion of geneset species
> ID
> >> To: "Iain Gallagher"<iaingallagher at btopenworld.com>
> >> Cc: "bioconductor"<bioconductor at stat.math.ethz.ch>
> >> Date: Monday, 5 September, 2011, 22:31
> >> Hi Iain --
> >>
> >> On 09/05/2011 07:57 AM, Iain Gallagher wrote:
> >>> Dear List
> >>>
> >>> I wonder if someone could help me re-annotate
> the
> >> Broad c2 genesets from human to bovine IDs. Here's
> what I
> >> have so far:
> >>>
> >>> rm(list=ls())
> >>> library(biomaRt)
> >>> library(GSEABase)
> >>>
> >>>
> >>
> setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')
> >>>
> >>> cowGenes<- read.table('cowGenesENID.csv',
> header=F,
> >> sep='\t')
> >>>
> >>> cow =
> >> useMart("ensembl",dataset="btaurus_gene_ensembl")
> >>> orth =
> >> getBM(c("ensembl_gene_id","human_ensembl_gene"),
> >> filters="ensembl_gene_id",values = cowGenes[,1],
> mart =
> >> cow)
> >>> orth2<- orth[which(orth[,2]!=''), ]#drop
> those with
> >> no human ortho
> >>>
> >>> orth3<- orth2[-which(duplicated(orth2[,1])
> ==
> >> TRUE),]#get only unique mappings i.e. one cow ID
> to one
> >> human ID
> >>>
> >>> head(orth3)
> >>>
> >>>
> >>> This gets me a data frame of bovine ENSEMBL
> gene Ids
> >> and the human ortholog (again ENSEMBL id).
> >>>
> >>> broadSets<-
> >>
> getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt',
> >> geneIdType = EntrezIdentifier('org.Hs.eg.db'))
> >>>
> >>> broadSetsENS<- mapIdentifiers(broadSets,
> >> ENSEMBLIdentifier())
> >>>
> >>> I now have the c2 Broad geneset with gene IDs
> >> converted to human ENSEMBL ids. I would like to
> map the
> >> postion of each of the ENSEMBL Ids in my dataframe
> (orth3)
> >> and then substitute in the bovine id and the clean
> up any
> >> NA's.
> >>>
> >>> I am at rather a loss as to how to do this
> and
> >> wondered if someone with more familiarity with the
> GSEABase
> >> would be able to help (or perhaps suggest a
> different
> >> strategy!)?
> >>
> >>
> >> Not sure that I follow entirely, but along the
> lines of
> >>
> >>     lst = lapply(broadSetsENS,
> function(gs,
> >> map) {
> >>        huids = geneIds(gs)
> >>        ## map, not sure what
> the columns
> >> are?
> >>        geneIds(gs) =
> map[map$huids %in%
> >> huids, "cowids"]
> >>        geneIdType(gs),
> ENSEMBLIdentifier()
> >>        gs
> >>     }, ortho3)
> >>     GeneSetCollection(lst)
> >>
> >> This is a bit of a guess, could be more specific
> if you
> >> provided a
> >> reproducible example.
> >>
> >> Hope that helps,
> >>
> >> Martin
> >>
> >>> Thanks
> >>>
> >>> Iain
> >>>
> >>>> sessionInfo()
> >>> R version 2.13.1 (2011-07-08)
> >>> Platform: x86_64-pc-linux-gnu (64-bit)
> >>>
> >>> locale:
> >>>     [1]
> LC_CTYPE=en_GB.utf8
> >>     LC_NUMERIC=C
> >>>     [3]
> LC_TIME=en_GB.utf8
> >>      LC_COLLATE=en_GB.utf8
> >>>     [5] LC_MONETARY=C
> >>     
>    LC_MESSAGES=en_GB.utf8
> >>>     [7]
> LC_PAPER=en_GB.utf8
> >>     LC_NAME=C
> >>>     [9] LC_ADDRESS=C
> >>          LC_TELEPHONE=C
> >>> [11] LC_MEASUREMENT=en_GB.utf8
> LC_IDENTIFICATION=C
> >>>
> >>> attached base packages:
> >>> [1] stats     graphics
> >> grDevices utils     datasets
> >> methods   base
> >>>
> >>> other attached packages:
> >>>     [1] GSEABase_1.14.0
> >>    graph_1.30.0
> >>     annotate_1.30.0
> >>>     [4]
> >>
> org.Hs.eg.db_2.5.0   org.Bt.eg.db_2.5.0   RSQLite_0.9-4
> >>>     [7] DBI_0.2-5
> >>        AnnotationDbi_1.14.1
> Biobase_2.12.2
> >>> [10] biomaRt_2.8.1
> >>>
> >>> loaded via a namespace (and not attached):
> >>> [1] RCurl_1.6-9  tools_2.13.1 XML_3.4-2
> >>    xtable_1.5-6
> >>>>
> >>>
> >>>
> >>>
> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >> --
> >> Computational Biology
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA
> 98109
> >>
> >> Location: M1-B861
> >> Telephone: 206 667-2793
> >>
> 
> 
> -- 
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> 
> Location: M1-B861
> Telephone: 206 667-2793
>