On Wed, Jan 21, 2009 at 6:56 AM, Paul Geeleher <paulgeeleher@gmail.com>wrote:

> Hi All,
>
> I've been following the instructions here:
>
>
> http://www.bioconductor.org/workshops/2007/seattle_bioc_intro_nov_07/folder.2007-11-30.5595085375/
>
> to find dysregulated kegg pathways in a dataset. What I'm now
> wondering is if I can use the same methodology to find co-regulated
> genes / genes with common transcription factors?
>
> I'd assume its simply of redefining the gene set
>
> gsc <- GeneSetCollection(eset, setType = KEGGCollection())
> to
> gsc <- GeneSetCollection(eset, setType =
> CoRegulatedGenesOrSomeFunctionLikeThat())
>
>
> I suppose what I'm asking is if such a gene set exists in
> Bioconductor? And if not can this be done somewhere else?
>

GSEABase has infrastructure to import the Broad MSIGDB from its XML
serialization;
see http://www.broad.mit.edu/gsea/downloads.jsp, where you will need to
register.

If you use getBroadSets() in GSEABase to import the entire MSIGDB you will
have access to
5452 gene sets.  Broad categorizes these in five groups; group c3 includes
motif gene sets
which includes a subclass called transcription factor targets.

Digging through a GSEABase GeneSetCollection can proceed in various ways.
What I will
show is probably not the most elegant approach:

Assume you have imported the whole MSIGDB as msig2.5

> isC3 = which(sapply(msig2.5, function(x)bcCategory(collectionType(x))) ==
"c3")
> C3coll = msig2.5[isC3]
> C3coll
GeneSetCollection
  names: RGAGGAARY_V$PU1_Q6, KRCTCNNNNMANAGC_UNKNOWN, ..., GTTATAT,MIR-410
(837 total)
  unique identifiers: PCDHGA5, CTXL, ..., pp9099 (15718 total)
  types in collection:
    geneIdType: SymbolIdentifier (1 total)
    collectionType: BroadCollection (1 total)
> C3coll[[1]]
setName: RGAGGAARY_V$PU1_Q6
geneIds: PCDHGA5, CTXL, ..., HCMOGT-1 (total: 522)
geneIdType: Symbol
collectionType: Broad
  bcCategory: c3 (Motif)
  bcSubCategory:  NA
details: use 'details(object)'
> details(C3coll[[1]])
setName: RGAGGAARY_V$PU1_Q6
geneIds: PCDHGA5, CTXL, ..., HCMOGT-1 (total: 522)
geneIdType: Symbol
collectionType: Broad
  bcCategory: c3 (Motif)
  bcSubCategory:  NA
setIdentifier: c3:261
description: Genes with promoter regions [-2kb,2kb] around transcription
start site containing the
motif RGAGGAARY which matches annotation for SPI1: spleen focus forming
virus (SFFV) proviral integ
ration oncogene spi1
  (longDescription available)
organism: Human,Mouse,Rat,Dog
pubMedIds:
urls: msigdb_v2.5.xml
contributor: Xiaohui Xie
setVersion: 0.0.1
creationDate: Thu Jul 10 16:59:23 2008

invocation of the longDescription method against C3coll[[1]] leads
to an interesting structure that will need to be parsed -- seems to be
in a marked up medline format.

once you have found the gene sets you are interested in, GSEABase
contains additional infrastructure to convert the identifiers for
genes used in MSIGDB to array probe set identifiers or entrez identifiers,
etc.



>
> Thanks.
>
> --
> Paul Geeleher
> Department of Mathematics
> National University of Ireland
> Galway
> Ireland
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

