[BioC] Using GOstats with ScanArray Express Data

Fri Jan 31 00:02:44 CET 2014

Hi,

Comments inline:

On Thu, Jan 30, 2014 at 2:33 PM, Joseph Shaw [guest]
<guest at bioconductor.org> wrote:
>
> Hi all,
>
> I was hoping to perform some ontological analysis using GOstats on a list of differentially expressed genes; however, I'm not entirely sure how to proceed.
>
> To provide some background:
> - Originally, I was working with data from a two-channel microarray experiment.
> - The data was produced using the ScanArray Express scanner.
> - The organism of interest is Campylobacter jejuni; it is exposed to two conditions (treatment and control).

Your first issue is that you will need to compile a list of GO terms
per gene for this organism.

I believe this is the vignette you will need to give you an idea of
what to do -- where the caveat is that you would need to compile these
annotations from "somewhere":

http://bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOstatsForUnsupportedOrganisms.pdf

> - I've managed to derive a list of genes identified as differentially expressed. As a result, I have two .txt files: one containing a column of the original complete list of probes/genes involved in the experiment and one containing a column of probes/genes identified as differentially expressed.

Good.

> Is it possible to implement GOstats procedures for the above scenario; the hyperGTest in particular?

Yes.

> I suppose I'm looking to implement something like the following:
>
>> hgCutoff <- 0.001
>> params <- new("GOHyperGParams",
> + geneIds=selectedGene.txt,
> + universeGeneIds=geneUniverse.txt,
> + annotation="hgu95av2.db",
> + ontology="BP",
> + pvalueCutoff=hgCutoff,
> + conditional=FALSE,
> + testDirection="over")
>>
>>hgOver <- hyperGTest(params)
>
> In particular,
> (1) I know I can't use .txt files as suggested in the above code. How can I convert the selectedGene.txt and geneUniverse.txt into the appropriate format to be used in the above code?

In principle, you will need to be able to map the gene ID's you are
providing as "up" or "down" to the gene IDs used in your GO database.

> (2) Currently, the probe names used in my .txt files are simply the probe (gene) names. Should these gene names be converted to Entrez IDs or some other format?

This will depend on how you construct your personalized mapping of GO
terms to genes for your organism.

> (3) Should this file contain the expression values (normalized log2 fold changes)?

No, the input to a GO hyperG test are simply the IDs of the genes
identified as "interesting" (differentially regulated in one
direction, or the other, or both) and the list of gene IDs that
consist of your "universe"

> (4) In the above code, I have used annotation="hgu95av2.db" (as used in the tutorial) simply because I'm not sure what this argument requires. Is this appropriate for the data as described above?

This is a package that provides some annotation for a particular affy
chip -- presumably the part of your documentation that you are
referencing is providing the list of "interesting" IDs as affy_ids
from the chip, and this annotation package has "the goods" to map
probe id's to gene (entrez) ids.

Apologies for the somehow-vague suggestions. Hopefully it will be
helpful in implementing the actually solution. Perhaps those more
familiar with the plan can give better specifics, but I've outlined
the bare minimum of what you need. Hopefully you will be able to
recover the exact particulars from the relevant vignettes.

-steve

-- 
Steve Lianoglou
Computational Biologist
Genentech