[BioC] GSEA using Broad genesets

zrl zrl1974 at gmail.com
Wed Feb 10 19:47:57 CET 2010


Thank you Martin, these are what I want. I like the second method to
create incidence matrix.
My last question is in GSEABase when we do this:

"gsc <- GeneSetCollection(bcrneg_filt1, setType=KEGGCollection())"

how does GSEABase collapse the affy probes to gene symbols?
(max,mean,median or not at all)


So, if we use download database such as ****.symbols.gmt,
how should we collapse the probes to symbols?

Sorry to bother you so much. Thank you very much.

Qiudao






On Wed, Feb 10, 2010 at 9:47 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 02/07/2010 03:25 PM, zrl wrote:
>> Hi Martin,
>>
>> Thank you for answering my question. Sorry I didn't make my question clearly.
>> In the case of "gsc <- GeneSetCollection(bcrneg_filt1,
>> setType=KEGGCollection())" and "Am<-incidence(gsc)", we use KEGG as
>> reference to create gene sets of bcrneg_filt1, then create a
>> incidence.
>>
>> My question is what if I use a download geneset database such as
>> "c3.all.v2.5.symbols.gmt" as reference to create gene set of
>> ExpressionSet bcrneg_filt1, then create a incidence matrix. Do I have
>> to manually do this? (I mean, identifying the genes in eset,then
>> correlates them in c3.all.v2.5.symbols.gmt to create gene sets) or is
>> there a direct command doing this?
>
> Hi --
>
>> c3gsc = getGmt("~/tmp/c3.all.v2.5.symbols.gmt",
> +                 geneIdType=SymbolIdentifier())
>
> It's possible to ask for the intersection of a gene set collection with
> specific gene dientifiers, so
>
>> c3gsc & c("DLC1", "FLJ39378")
>
> so for an Affy array like bcrneg_filt1 a command like
>
>  library(Biobase)
>  data(sample.ExpressionSet)
>  eset = sample.ExpressionSet[250:300,]
>  symbolIds = getSYMBOL(featureNames(eset), annotation(eset))
>
> gets the gene symbols, and
>
>  c3gsc1 = c3gsc & symbolIds
>
> does the subset. But it might be just as easy to
>
>  m = incidence(c3gsc)
>  m1 = m[,colnames(m) %in% symbolIds]
>  m1 = m1[rowSums(m) != 0, ]
>
> (the & operator alters the names of the gene sets, and keeps empty sets,
> so further processing would probably be needed).
>
> Hope that helps.
>
> Martin
>
>
>> Thanks.
>>
>>
>>
>>
>>
>>
>> On Sun, Feb 7, 2010 at 9:11 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>> On 02/06/2010 04:05 PM, zrl wrote:
>>>> Dear list,
>>>>
>>>> I have a question regarding using broad gene sets for GSEA anlaysis.
>>>>
>>>> As we know, we have "gsc <- GeneSetCollection(bcrneg_filt1,
>>>> setType=KEGGCollection())" and "Am<-incidence(gsc)" to generate
>>>> incidence matrix for further anlaysis.
>>>>
>>>> I have learned to get the geneset file from Broad such as: "c3gsc2 <-
>>>> getGmt("/path/to/c3.all.v2.5.symbols.gmt",
>>>> collectionType=BroadCollection(category="c3"),
>>>> geneIdType=SymbolIdentifier())"
>>>>
>>>> My question is how to use c3gsc2 and bcneg_filt1 to create a new
>>>> incidence matrix ? Do I have to manually do this? or there is a
>>>> command which can do this?
>>>
>>> Hi Quidao
>>>
>>> bcneg_filt1 is a subset of an ExpressionSet, and is just another source
>>> for creating a gene set collection. Here you're using
>>> c3.all.v2.5.symbols.gmt as a source for your gene set collection. The
>>> incidence matrix is
>>>
>>>> m <- incidence(c3gsc2)
>>>> class(m)
>>> [1] "matrix"
>>>> dim(m)
>>> [1]   837 15718
>>>> m[1:5, 1:5]
>>>                        DLC1 FLJ39378 PTGS1 RORC VPRBP
>>> RGAGGAARY_V$PU1_Q6         1        1     1    1     1
>>> KRCTCNNNNMANAGC_UNKNOWN    0        0     0    0     0
>>> AAAYWAACM_V$HFH4_01        0        0     0    0     0
>>> YYCATTCAWW_UNKNOWN         0        0     0    0     0
>>> CYTAGCAAY_UNKNOWN          0        0     0    0     0
>>>
>>> with rows as set names and columns as symbols.
>>>
>>> Martin
>>>
>>>>
>>>>
>>>>
>>>> Thanks.
>>>>
>>>> Qiudao
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> --
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>



More information about the Bioconductor mailing list