[BioC] GSEA using Broad genesets
Martin Morgan
mtmorgan at fhcrc.org
Wed Feb 10 15:47:26 CET 2010
On 02/07/2010 03:25 PM, zrl wrote:
> Hi Martin,
>
> Thank you for answering my question. Sorry I didn't make my question clearly.
> In the case of "gsc <- GeneSetCollection(bcrneg_filt1,
> setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as
> reference to create gene sets of bcrneg_filt1, then create a
> incidence.
>
> My question is what if I use a download geneset database such as
> "c3.all.v2.5.symbols.gmt" as reference to create gene set of
> ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have
> to manually do this? (I mean, identifying the genes in eset,then
> correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is
> there a direct command doing this?
Hi --
> c3gsc = getGmt("~/tmp/c3.all.v2.5.symbols.gmt",
+ geneIdType=SymbolIdentifier())
It's possible to ask for the intersection of a gene set collection with
specific gene dientifiers, so
> c3gsc & c("DLC1", "FLJ39378")
so for an Affy array like bcrneg_filt1 a command like
library(Biobase)
data(sample.ExpressionSet)
eset = sample.ExpressionSet[250:300,]
symbolIds = getSYMBOL(featureNames(eset), annotation(eset))
gets the gene symbols, and
c3gsc1 = c3gsc & symbolIds
does the subset. But it might be just as easy to
m = incidence(c3gsc)
m1 = m[,colnames(m) %in% symbolIds]
m1 = m1[rowSums(m) != 0, ]
(the & operator alters the names of the gene sets, and keeps empty sets,
so further processing would probably be needed).
Hope that helps.
Martin
> Thanks.
>
>
>
>
>
>
> On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 02/06/2010 04:05 PM, zrl wrote:
>>> Dear list,
>>>
>>> I have a question regarding using broad gene sets for GSEA anlaysis.
>>>
>>> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1,
>>> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate
>>> incidence matrix for further anlaysis.
>>>
>>> I have learned to get the geneset file from Broad such as: "c3gsc2 <-
>>> getGmt("/path/to/c3.all.v2.5.symbols.gmt",
>>> collectionType=BroadCollection(category="c3"),
>>> geneIdType=SymbolIdentifier())"
>>>
>>> My question is how to use c3gsc2 and bcneg_filt1 to create a new
>>> incidence matrix ? Do I have to manually do this? or there is a
>>> command which can do this?
>>
>> Hi Quidao
>>
>> bcneg_filt1 is a subset of an ExpressionSet, and is just another source
>> for creating a gene set collection. Here you're using
>> c3.all.v2.5.symbols.gmt as a source for your gene set collection. The
>> incidence matrix is
>>
>>> m <- incidence(c3gsc2)
>>> class(m)
>> [1] "matrix"
>>> dim(m)
>> [1] 837 15718
>>> m[1:5, 1:5]
>> DLC1 FLJ39378 PTGS1 RORC VPRBP
>> RGAGGAARY_V$PU1_Q6 1 1 1 1 1
>> KRCTCNNNNMANAGC_UNKNOWN 0 0 0 0 0
>> AAAYWAACM_V$HFH4_01 0 0 0 0 0
>> YYCATTCAWW_UNKNOWN 0 0 0 0 0
>> CYTAGCAAY_UNKNOWN 0 0 0 0 0
>>
>> with rows as set names and columns as symbols.
>>
>> Martin
>>
>>>
>>>
>>>
>>> Thanks.
>>>
>>> Qiudao
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list