[BioC] PreProcess and Limma 'done'. What directions now?

Wed Jun 10 12:32:14 CEST 2009

On Wed, Jun 10, 2009 at 5:44 AM, Massimo Pinto<pintarello at gmail.com> wrote:
> Hi Steve and all,
>
> On Tue, Jun 9, 2009 at 5:15 PM, Steve
> Lianoglou<mailinglist.honeypot at gmail.com> wrote:
>> Hi Massimo,
>>
>> On Jun 9, 2009, at 9:13 AM, Massimo Pinto wrote:
>>
>>> Greetings BioC list readers,
>>>
>>> I have been working over the past few weeks on data normalization
>>> first (Agi4X44PreProcess) and linear models then (limma). I sense that
>>> I may have reached a point where questions may be addressed regarding
>>> regulation of individual genes of interest, as well as gene ontology.
>>>
>>> Would you please so kindly provide advice with regards to my next
>>> step? In brief, I would like to
>>>
>>> 1) Query expression of genes of my interest, which are known to be
>>> implicated in the biology that I am working on
>>
>> I'm not sure what you're asking to do here, actually, since after doing your
>> normalization/etc. this is the easy part, right?
>>
>> You can just select these rows out of your normalized data matrix, or
>> highlight their probe values on an MA plot, or plot their points on a pdf of
>> your signal intensities. If you're looking for their differential expression
>> calls, you'd just extract their +1/-1 calls post-limma-facto, if you will
>> :-)
>
> I have beem through normalization and basic limma operations. What I
> would like to do here is to ask questions like "what happened to this
> and that particular gene"? These would be genes whose expression was
> already measured via other techiniques (Rt-PCR as well as Western
> Blotting of the proteins that they encode). Ideally, I would like to
> produce a mini-list of such genes and have their Gene ID displayed (in
> tables and any graphs) in place of their Agilent Probe ID. Should be
> possible, since I have an annotation file.
>
>> mappings[1:10,1:4]
>
>          PROBE    ACCNUM       SYMBOL  ENTREZID
> 1   A_24_P66027 NM_004900     APOBEC3B      9582
> 2   A_32_P77178  AA085955         ADAR       103
> 3  A_23_P212522 NM_014616       ATP11B     23200
> 4  A_24_P934473  AK092846 LOC100132006 100132006
> 5    A_24_P9671 NM_001539       DNAJA1      3301
> 6  A_24_P801451 NM_006709        EHMT2     10919
> 7   A_32_P30710 NM_000978        RPL23      9349
> 8  A_24_P704878        NA           NA        NA
> 9   A_32_P86028 NM_001017        RPS13      6207
> 10  A_23_P65830 NM_198527        HDDC3    374659

It seems to me that you have two questions -- one relating to annotation
and one relating to statistical reporting on selected genes

Apropos annotation: It should not be necessary to use this table.  I
haven't used Agilent but I see from
the Agi4* vignette that

-Agi4x44PreProcess employees the corresponding Bioconductor annotation
-packages (human: "hgug4112a.db"; mouse: "mgug4122a.db") to assign
-to each probe the ACCNUM, SYMBOL, ENTREZID, DESCRIPTION, GO TERMS AND GO IDS.

therefore you have access to annotation through queries resolved by
AnnotationDbi.  there are various ways to do this; i will review three

a) direct queries:  once you have installed hgug4112a.db, run hgug4112a() to see
what mappings are available.  To obtain the gene symbol and entrez id
corresponding
to a given probe, use

> library(hgug4112a.db)
Loading required package: AnnotationDbi
Loading required package: Biobase

Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'openVignette()'. To cite Bioconductor, see
  'citation("Biobase")' and for packages 'citation(pkgname)'.

Loading required package: DBI
> get("A_24_P66027", hgug4112aENTREZID)
[1] "9582"
> get("A_24_P66027", hgug4112aSYMBOL)
[1] "APOBEC3B"

b) the GSEABase infrastructure for gene set management
We can use this to do bulk translations
> f4 = c("A_24_P66027", "A_32_P77178", "A_23_P212522", "A_24_P934473")
> s4 = GeneSet(f4, geneIdType=AnnotationIdentifier("hgug4112a.db"))
> s4
setName: NA
geneIds: A_24_P66027, A_32_P77178, A_23_P212522, A_24_P934473 (total: 4)
geneIdType: Annotation (hgug4112a.db)
collectionType: Null
details: use 'details(object)'
> s4b = s4
> geneIdType(s4b) = SymbolIdentifier()
> s4b
setName: NA
geneIds: APOBEC3B, ADAR, ATP11B, LOC100132006 (total: 4)
geneIdType: Symbol (hgug4112a.db)
collectionType: Null
details: use 'details(object)'

c) the unfortunately named annaffy package -- which will create a nice
HTML page with annotations.  using t4 vector defined above

> library(annaffy)
Loading required package: GO.db
Loading required package: KEGG.db
> t4 = aafTableAnn(f4, "hgug4112a.db")
> saveHTML(t4, file="t4.html")

point your browser to t4.html and you will see a very informative
hyperlinked page --
with a misleading default title, but you can change that

the second part of your question concerns statistical reporting -- you
have used limma,
so the topTable output can be used to obtain statistics on selected
genes, but you have
to choose n suitably so that the genes you are interested in are on the list

it is possible to enhance the HTML page generated above with numerical
and visual
additions that reflect the statistical findings related to genes.
these steps are described
in the Bioconductor monograph of 2005, chapter by Colin Smith, or the
annaffy vignette.

>
>>> 2) Get information about gene ontology
>>
>> There are several package you can check. You've already mentioned
>> goProfiles, but you might also want to consider GOstats as well. There's
>> also topGO.
>
> thank you.
>
>>
>>> 3) Ask about involvement of signaling pahways
>>
>> I'm not sure about this one, but searching the bioconductor d/l page for the
>> word "pathway" comes up with several leads, like SPIA and sigPathway.
>>
>> I've never used either of these, but perhaps you can let us know how you
>> like them :-)
>
> Will do!
> Thank you.
> Massimo
>
>
>> --
>> Steve Lianoglou
>> Graduate Student: Physiology, Biophysics and Systems Biology
>> Weill Medical College of Cornell University
>>
>> http://cbio.mskcc.org/~lianos
>>
>>
>>
>>
>
>
>
> --
> Massimo Pinto
> Post Doctoral Research Fellow
> Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome
> http://claimid.com/massimopinto
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Vincent Carey, PhD
Biostatistics, Channing Lab
617 525 2265