[BioC] retrieve genes names after KEGG hypergeometric test

Iain Gallagher iaingallagher at btopenworld.com
Sun Nov 7 22:16:31 CET 2010


Hi Clementine

Below is the code I use for this (no functions but it works).

hgOver is the result of the hyperGtest. 

hypoResults is the result of a limma test for differential expression. 

Basically this gets the sig genes by category and pulls in the logFC (from the limma result) and generates a table so I can see the change for each gene in each category. Not the most elegant code - but then that's not my main profession ;-)

You could stop at keggMapped3 and that would be genes with pathways (if I remember correctly).

#geneIdsByCategory#####
sigGenesByCat <- geneIdsByCategory(hgOver, summary(hgOver)[,1])
sigMap <- stack(sigGenesByCat)
symsMapped <- mget(as.character(sigMap[,1]), org.Hs.egSYMBOL, ifnotfound=NA)
symsMapped <- stack(symsMapped)
keggMapped <- with(sigMap, symsMapped[,1][match(symsMapped[,2], sigMap[,1])])

keggMapped <- cbind(keggMapped, sigMap)
keggMapped2 <- unstack(keggMapped, keggMapped~ind)

#now replace KEGG IDs with Term
termsInd <- match(names(keggMapped2), summary(hgOver)[,1])
keggMapped3 <- keggMapped2
names(keggMapped3) <- summary(hgOver)[,7][termsInd]

##add FC info to GO categories
keggMapped4 <- stack(keggMapped3)
fcInd<-match(keggMapped4[,1], hypoResults[,8])
keggMapped4$logFC <- hypoResults[,2][fcInd]

write.table(keggMapped4, 'KEGGAnnotated.txt', sep='\t', quote=F)


cheers

iain

--- On Fri, 29/10/10, Clémentine Dressaire <clementinedressaire at itqb.unl.pt> wrote:

> From: Clémentine Dressaire <clementinedressaire at itqb.unl.pt>
> Subject: Re: [BioC] retrieve genes names after KEGG hypergeometric test
> To: "Mike Walter" <michael_walter at email.de>
> Cc: bioconductor at stat.math.ethz.ch
> Date: Friday, 29 October, 2010, 14:21
> 
> Hi Mike,
> 
>  
> 
> Could ou explain me the difference between the db and "db"
> you are using?
> 
> If db is the character vector with the annotation database
> for your array
> 
> without the .db extension, then what does db represent?
> 
> 
> 
> Again thanks for your help,
> 
> 
> 
> Clémentine
> 
> 
> 
> 
> 
> On Fri, 29 Oct 2010 14:23:00 +0200 (CEST), "Mike Walter"
> 
> <michael_walter at email.de>
> wrote:
> 
> > Hi Clémentine,
> 
> > 
> 
> > I don't know, if such a function exists. I use two
> little helper
> 
> functions
> 
> > to retrieve probe IDs or gene symbols of genes in a
> genelist, that are
> 
> > associated with a KEGG ID:
> 
> > 
> 
> > KEGG2genes = function(KEGGID, genelist, db){
> 
> >  require(paste(db, "db", sep="."), character.only
> = TRUE)
> 
> >  l = vector("list")
> 
> >  for (i in 1:length(KEGGID)){
> 
> >  kegg = as.matrix(unlist(mget(KEGGID[i],
> get(paste(db, "PATH2PROBE",
> 
> >  sep="")), ifnotfound=NA)))
> 
> >  l[[i]] =
> genelist[is.element(genelist,kegg[,1])]
> 
> >  }
> 
> > names(l)=KEGGID
> 
> > l
> 
> > }
> 
> > 
> 
> > KEGG2symbol = function(KEGGID, genelist, db){
> 
> >  l = vector("list")
> 
> >  for (i in 1:length(KEGGID)){
> 
> >  id = unlist(KEGG2genes(KEGGID=KEGGID[i],
> genelist=genelist, db=db))
> 
> >  l[[i]] = as.matrix(mget(id, get(paste(db,
> "SYMBOL", sep="")),
> 
> >  ifnotfound=NA))
> 
> >  }
> 
> >  names(l)=KEGGID
> 
> >  l
> 
> > }
> 
> > 
> 
> > where "KEGGID" is a character vector of your KEGGID(s)
> you are
> 
> interested
> 
> > in, "genelist" is a character vector containing the
> probe IDs/probeset
> 
> IDs
> 
> > of your genelist you used to create the
> KEGGHyperGResult and "db" is a
> 
> > character vector with the annotation database for your
> array without the
> 
> > .db extension (e.g. db="hgu133plus" for the affy U133+
> 2.0 array). As a
> 
> > result you get a matrix containing the probeIDs and
> genesymbols for each
> 
> > KEGGID stored in a list. It might not be the most
> elegant way, but it
> 
> > works. 
> 
> > 
> 
> > Kind regards, 
> 
> > 
> 
> > Mike
> 
> > 
> 
> > -----Ursprüngliche Nachricht-----
> 
> > Von: "Clémentine Dressaire" <clementinedressaire at itqb.unl.pt>
> 
> > Gesendet: 29.10.2010 13:27:44
> 
> > An: bioconductor at stat.math.ethz.ch
> 
> > Betreff: [BioC] retrieve genes names after KEGG
> hypergeometric test
> 
> > 
> 
> >>
> 
> >>Dear BioC users,
> 
> >>
> 
> >>
> 
> >>
> 
> >>I performed different hypergometric tests on my
> data regarding GO terms
> 
> >>
> 
> >>and KEGG pathways. With GO resukt I can use the
> probeSetSummary function
> 
> >>to
> 
> >>
> 
> >>retrieve the gene list associated to each
> significant category.
> 
> >>
> 
> >>However this function does not work if I apply the
> HG test using
> 
> >>
> 
> >>KEGGHyperGParams because the results are not of
> GOHyperGResult class...
> 
> Is
> 
> >>
> 
> >>there any equivalent KEGG function to get those
> genes list? 
> 
> >>
> 
> >>
> 
> >>
> 
> >>WIth advanced thanks for your help.
> 
> >>
> 
> >>
> 
> >>
> 
> >>Clémentine 
> 
> >>
> 
> >>
> 
> >>
> 
> >>-- 
> 
> >>
> 
> >>Clémentine Dressaire
> 
> >>
> 
> >>Post-doctoral research fellow
> 
> >>
> 
> >>Control of gene expression lab
> 
> >>
> 
> >>ITQB - Instituto de Tecnologia Química e
> Biológica
> 
> >>
> 
> >>Apartado 127, Av. da República
> 
> >>
> 
> >>2780-157 Oeiras
> 
> >>
> 
> >>Portugal
> 
> >>
> 
> >>+351 214469562
> 
> >>
> 
> >>_______________________________________________
> 
> >>Bioconductor mailing list
> 
> >>Bioconductor at stat.math.ethz.ch
> 
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
> >>Search the archives:
> 
> >>http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list