[BioC] GSEA using Broad genesets

Martin Morgan mtmorgan at fhcrc.org
Wed Feb 10 15:47:26 CET 2010


On 02/07/2010 03:25 PM, zrl wrote:
> Hi Martin,
> 
> Thank you for answering my question. Sorry I didn't make my question clearly.
> In the case of "gsc <- GeneSetCollection(bcrneg_filt1,
> setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as
> reference to create gene sets of bcrneg_filt1, then create a
> incidence.
> 
> My question is what if I use a download geneset database such as
> "c3.all.v2.5.symbols.gmt" as reference to create gene set of
> ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have
> to manually do this? (I mean, identifying the genes in eset,then
> correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is
> there a direct command doing this?

Hi --

> c3gsc = getGmt("~/tmp/c3.all.v2.5.symbols.gmt",
+                 geneIdType=SymbolIdentifier())

It's possible to ask for the intersection of a gene set collection with
specific gene dientifiers, so

> c3gsc & c("DLC1", "FLJ39378")

so for an Affy array like bcrneg_filt1 a command like

  library(Biobase)
  data(sample.ExpressionSet)
  eset = sample.ExpressionSet[250:300,]
  symbolIds = getSYMBOL(featureNames(eset), annotation(eset))

gets the gene symbols, and

  c3gsc1 = c3gsc & symbolIds

does the subset. But it might be just as easy to

  m = incidence(c3gsc)
  m1 = m[,colnames(m) %in% symbolIds]
  m1 = m1[rowSums(m) != 0, ]

(the & operator alters the names of the gene sets, and keeps empty sets,
so further processing would probably be needed).

Hope that helps.

Martin


> Thanks.
> 
> 
> 
> 
> 
> 
> On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> On 02/06/2010 04:05 PM, zrl wrote:
>>> Dear list,
>>>
>>> I have a question regarding using broad gene sets for GSEA anlaysis.
>>>
>>> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1,
>>> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate
>>> incidence matrix for further anlaysis.
>>>
>>> I have learned to get the geneset file from Broad such as: "c3gsc2 <-
>>> getGmt("/path/to/c3.all.v2.5.symbols.gmt",
>>> collectionType=BroadCollection(category="c3"),
>>> geneIdType=SymbolIdentifier())"
>>>
>>> My question is how to use c3gsc2 and bcneg_filt1 to create a new
>>> incidence matrix ? Do I have to manually do this? or there is a
>>> command which can do this?
>>
>> Hi Quidao
>>
>> bcneg_filt1 is a subset of an ExpressionSet, and is just another source
>> for creating a gene set collection. Here you're using
>> c3.all.v2.5.symbols.gmt as a source for your gene set collection. The
>> incidence matrix is
>>
>>> m <- incidence(c3gsc2)
>>> class(m)
>> [1] "matrix"
>>> dim(m)
>> [1]   837 15718
>>> m[1:5, 1:5]
>>                        DLC1 FLJ39378 PTGS1 RORC VPRBP
>> RGAGGAARY_V$PU1_Q6         1        1     1    1     1
>> KRCTCNNNNMANAGC_UNKNOWN    0        0     0    0     0
>> AAAYWAACM_V$HFH4_01        0        0     0    0     0
>> YYCATTCAWW_UNKNOWN         0        0     0    0     0
>> CYTAGCAAY_UNKNOWN          0        0     0    0     0
>>
>> with rows as set names and columns as symbols.
>>
>> Martin
>>
>>>
>>>
>>>
>>> Thanks.
>>>
>>> Qiudao
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list