[BioC] species in MsigDB of GSEA

Martin Morgan mtmorgan at fhcrc.org
Wed Jul 16 04:49:52 CEST 2008


Hi Di --

"Di Wu" <di.wu at med.monash.edu.au> writes:

> Dear list,
>
> I am trying to use MsigDB, the gene set database from GSEA. I am interested
> to know whether the sets of genes are from human or mouse, particularly in
> C2.
> I know I can always click the web and go deep to see how a set was obtained.
> But is there any coding way to get the species sources for all the gene sets
> in C2 or MsigDB.

If you're using the GSEABase package, then each gene set read by
getBroadSets records the organism, so for example

> fl <- "/path/to/msigdb_v2.1.xml"
> gss <- getBroadSets(fl) # read entire msigdb
> organism(gss[[1]])
"Human"
> table(sapply(gss, organism))

         Chimpanzee             Generic               Human 
                  1                 456                1769 
Human,Mouse,Rat,Dog               Mouse                 Pig 
                837                 248                  11 
                Rat              Rhesus          Zebra Fish 
                  3                   4                   8 

> # retrieve a few sets from the web
> gss <- getBroadSets(asBroadUri(c('chr16q', 'GNF2_ZAP70')))
> organism(gss[[1]])
"Human"

As a 'closer to the metal' alternative, you could use the XML package

> xml <- xmlTreeParse(fl, useInternal=TRUE)
> query <- '//GENESET[@STANDARD_NAME="KENNY_WNT_UP"]/@ORGANISM'
> xpathApply(xml, query, xmlValue)
[[1]]
[1] "Mouse"
> table(unlist(xpathApply(xml, "//@ORGANISM", xmlValue)))

         Chimpanzee             Generic               Human 
                  1                 456                1769 
Human,Mouse,Rat,Dog               Mouse                 Pig 
                837                 248                  11 
                Rat              Rhesus          Zebra Fish 
                  3                   4                   8 

Martin

> Appreciate your suggestions.
> Cheers,
> Di
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list