[BioC] GSEABase and Broad Inst Sets
Martin Morgan
mtmorgan at fhcrc.org
Tue Jul 6 15:43:03 CEST 2010
On 07/06/2010 04:32 AM, Iain Gallagher wrote:
> Hi List
>
> I'm trying to carry out a GSEA analysis on an ExpressionSet object using GSEABase and the Broad Institute genesets (well the C2 subset, specifically).
>
> library(GSEABase)
>
> broadSets <- getBroadSets("/home/iain/Desktop/prostateProjectJN_GS/CEL/msigdb_v2.5.xml")# file downloaded from Broad site
>
> isC2 <- sapply(broadSets, function(x) bcCategory(collectionType(x))) == "c2"
>
> broadSetsC2<-broadSets[isC2]
>
> relevantArrays <- grep('Hypo.No.None|Norm.No.None', TS)
>
> relevantArrays <- rmaDataFiltered[ ,relevantArrays]
>
> So this get me to the point where I have my expression data and the genesets I want. This is where I'm having trouble. Following the GSEABase tutorials with KEGG annotation I have no problems; but I can't calculate an incidence matrix from my expression data using the Broad genesets I have downloaded.
>
> i.e.
>
> testGSC <- GeneSetCollection(relevantArrays, setType=BroadCollection())
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
> object 'hgu133plus2BROAD' not found
> Error in revmap(getAnnMap(toupper(collectionType(setType)), annotation(idType))) :
> error in evaluating the argument 'x' in selecting a method for function 'revmap'
>
>
> This is a mapping issue I know but I'm having a conceptual block
getting over it. If anyone could offer any help I'd be grateful.
For a reproducible example, after
library(GSEABase)
example(getBroadSets)
data(sample.ExpressionSet)
eset = sample.ExpressionSet # less typing!
If you're interested in creating a GeneSetCollection that contains just
those symbols that are relevant to your ExpressionSet 'eset' then
gss1 = mapIdentifiers(gss, AnnotationIdentifier(annotation(eset)))
Subsetting eset might look like
idx = featureNames(eset) %in% unlist(geneIds(gss1), use.names=FALSE)
eset[idx,]
In answering this question, I realized that getBroadSets does not
correctly interpret the identifiers as 'Symbols'; until this is fixed in
GSEABase, you should
library(limma)
sids <- lapply(geneIds(gss), alias2Symbol, "Hs", TRUE)
gss = GeneSetCollection(mapply("geneIds<-", gss, sids))
Martin
>
> iain
>
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
> [7] LC_PAPER=en_GB.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] affyQCReport_1.24.0 affyPLM_1.22.0 preprocessCore_1.8.0
> [4] xtable_1.5-6 simpleaffy_2.22.0 gcrma_2.18.1
> [7] latticeExtra_0.6-11 lattice_0.18-3 RColorBrewer_1.0-2
> [10] hgu133plus2.db_2.3.5 hgu133plus2cdf_2.5.0 affy_1.24.2
> [13] limma_3.2.3 GSEABase_1.8.0 graph_1.26.0
> [16] annotate_1.24.1 hgu95av2.db_2.3.5 org.Hs.eg.db_2.3.6
> [19] RSQLite_0.9-0 DBI_0.2-5 AnnotationDbi_1.8.2
> [22] genefilter_1.28.2 ALL_1.4.7 Biobase_2.6.1
>
> loaded via a namespace (and not attached):
> [1] affyio_1.14.0 Biostrings_2.14.12 grid_2.10.1 IRanges_1.4.16
> [5] splines_2.10.1 survival_2.35-8 tools_2.10.1 XML_3.1-0
>>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list