[BioC] retrieve genes names after KEGG hypergeometric test
Iain Gallagher
iaingallagher at btopenworld.com
Sun Nov 7 22:16:31 CET 2010
Hi Clementine
Below is the code I use for this (no functions but it works).
hgOver is the result of the hyperGtest.
hypoResults is the result of a limma test for differential expression.
Basically this gets the sig genes by category and pulls in the logFC (from the limma result) and generates a table so I can see the change for each gene in each category. Not the most elegant code - but then that's not my main profession ;-)
You could stop at keggMapped3 and that would be genes with pathways (if I remember correctly).
#geneIdsByCategory#####
sigGenesByCat <- geneIdsByCategory(hgOver, summary(hgOver)[,1])
sigMap <- stack(sigGenesByCat)
symsMapped <- mget(as.character(sigMap[,1]), org.Hs.egSYMBOL, ifnotfound=NA)
symsMapped <- stack(symsMapped)
keggMapped <- with(sigMap, symsMapped[,1][match(symsMapped[,2], sigMap[,1])])
keggMapped <- cbind(keggMapped, sigMap)
keggMapped2 <- unstack(keggMapped, keggMapped~ind)
#now replace KEGG IDs with Term
termsInd <- match(names(keggMapped2), summary(hgOver)[,1])
keggMapped3 <- keggMapped2
names(keggMapped3) <- summary(hgOver)[,7][termsInd]
##add FC info to GO categories
keggMapped4 <- stack(keggMapped3)
fcInd<-match(keggMapped4[,1], hypoResults[,8])
keggMapped4$logFC <- hypoResults[,2][fcInd]
write.table(keggMapped4, 'KEGGAnnotated.txt', sep='\t', quote=F)
cheers
iain
--- On Fri, 29/10/10, Clémentine Dressaire <clementinedressaire at itqb.unl.pt> wrote:
> From: Clémentine Dressaire <clementinedressaire at itqb.unl.pt>
> Subject: Re: [BioC] retrieve genes names after KEGG hypergeometric test
> To: "Mike Walter" <michael_walter at email.de>
> Cc: bioconductor at stat.math.ethz.ch
> Date: Friday, 29 October, 2010, 14:21
>
> Hi Mike,
>
>
>
> Could ou explain me the difference between the db and "db"
> you are using?
>
> If db is the character vector with the annotation database
> for your array
>
> without the .db extension, then what does db represent?
>
>
>
> Again thanks for your help,
>
>
>
> Clémentine
>
>
>
>
>
> On Fri, 29 Oct 2010 14:23:00 +0200 (CEST), "Mike Walter"
>
> <michael_walter at email.de>
> wrote:
>
> > Hi Clémentine,
>
> >
>
> > I don't know, if such a function exists. I use two
> little helper
>
> functions
>
> > to retrieve probe IDs or gene symbols of genes in a
> genelist, that are
>
> > associated with a KEGG ID:
>
> >
>
> > KEGG2genes = function(KEGGID, genelist, db){
>
> > require(paste(db, "db", sep="."), character.only
> = TRUE)
>
> > l = vector("list")
>
> > for (i in 1:length(KEGGID)){
>
> > kegg = as.matrix(unlist(mget(KEGGID[i],
> get(paste(db, "PATH2PROBE",
>
> > sep="")), ifnotfound=NA)))
>
> > l[[i]] =
> genelist[is.element(genelist,kegg[,1])]
>
> > }
>
> > names(l)=KEGGID
>
> > l
>
> > }
>
> >
>
> > KEGG2symbol = function(KEGGID, genelist, db){
>
> > l = vector("list")
>
> > for (i in 1:length(KEGGID)){
>
> > id = unlist(KEGG2genes(KEGGID=KEGGID[i],
> genelist=genelist, db=db))
>
> > l[[i]] = as.matrix(mget(id, get(paste(db,
> "SYMBOL", sep="")),
>
> > ifnotfound=NA))
>
> > }
>
> > names(l)=KEGGID
>
> > l
>
> > }
>
> >
>
> > where "KEGGID" is a character vector of your KEGGID(s)
> you are
>
> interested
>
> > in, "genelist" is a character vector containing the
> probe IDs/probeset
>
> IDs
>
> > of your genelist you used to create the
> KEGGHyperGResult and "db" is a
>
> > character vector with the annotation database for your
> array without the
>
> > .db extension (e.g. db="hgu133plus" for the affy U133+
> 2.0 array). As a
>
> > result you get a matrix containing the probeIDs and
> genesymbols for each
>
> > KEGGID stored in a list. It might not be the most
> elegant way, but it
>
> > works.
>
> >
>
> > Kind regards,
>
> >
>
> > Mike
>
> >
>
> > -----Ursprüngliche Nachricht-----
>
> > Von: "Clémentine Dressaire" <clementinedressaire at itqb.unl.pt>
>
> > Gesendet: 29.10.2010 13:27:44
>
> > An: bioconductor at stat.math.ethz.ch
>
> > Betreff: [BioC] retrieve genes names after KEGG
> hypergeometric test
>
> >
>
> >>
>
> >>Dear BioC users,
>
> >>
>
> >>
>
> >>
>
> >>I performed different hypergometric tests on my
> data regarding GO terms
>
> >>
>
> >>and KEGG pathways. With GO resukt I can use the
> probeSetSummary function
>
> >>to
>
> >>
>
> >>retrieve the gene list associated to each
> significant category.
>
> >>
>
> >>However this function does not work if I apply the
> HG test using
>
> >>
>
> >>KEGGHyperGParams because the results are not of
> GOHyperGResult class...
>
> Is
>
> >>
>
> >>there any equivalent KEGG function to get those
> genes list?
>
> >>
>
> >>
>
> >>
>
> >>WIth advanced thanks for your help.
>
> >>
>
> >>
>
> >>
>
> >>Clémentine
>
> >>
>
> >>
>
> >>
>
> >>--
>
> >>
>
> >>Clémentine Dressaire
>
> >>
>
> >>Post-doctoral research fellow
>
> >>
>
> >>Control of gene expression lab
>
> >>
>
> >>ITQB - Instituto de Tecnologia Química e
> Biológica
>
> >>
>
> >>Apartado 127, Av. da República
>
> >>
>
> >>2780-157 Oeiras
>
> >>
>
> >>Portugal
>
> >>
>
> >>+351 214469562
>
> >>
>
> >>_______________________________________________
>
> >>Bioconductor mailing list
>
> >>Bioconductor at stat.math.ethz.ch
>
> >>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> >>Search the archives:
>
> >>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list