[BioC] Using GOstats with ScanArray Express Data

Joseph Shaw [guest] guest at bioconductor.org
Thu Jan 30 23:33:02 CET 2014

Hi all,

I was hoping to perform some ontological analysis using GOstats on a list of differentially expressed genes; however, I'm not entirely sure how to proceed.

To provide some background:
- Originally, I was working with data from a two-channel microarray experiment.
- The data was produced using the ScanArray Express scanner.
- The organism of interest is Campylobacter jejuni; it is exposed to two conditions (treatment and control).
- I've managed to derive a list of genes identified as differentially expressed. As a result, I have two .txt files: one containing a column of the original complete list of probes/genes involved in the experiment and one containing a column of probes/genes identified as differentially expressed.

Is it possible to implement GOstats procedures for the above scenario; the hyperGTest in particular?

I've read the pdf tutorial file located on the bioconductor website (http://bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOstatsHyperG.pdf), but the document is primarily concerned with Affymetrix data.

>From what I've gathered, my .txt file containing the original complete list of probes is analogous to the gene universe data structure and my .txt file containing the list of probes identified as differentially expressed is analogous to the selected gene data structure.

I suppose I'm looking to implement something like the following:

> hgCutoff <- 0.001
> params <- new("GOHyperGParams",
+ geneIds=selectedGene.txt,
+ universeGeneIds=geneUniverse.txt,
+ annotation="hgu95av2.db",
+ ontology="BP",
+ pvalueCutoff=hgCutoff,
+ conditional=FALSE,
+ testDirection="over")
>hgOver <- hyperGTest(params)

In particular,
(1) I know I can't use .txt files as suggested in the above code. How can I convert the selectedGene.txt and geneUniverse.txt into the appropriate format to be used in the above code?
(2) Currently, the probe names used in my .txt files are simply the probe (gene) names. Should these gene names be converted to Entrez IDs or some other format?
(3) Should this file contain the expression values (normalized log2 fold changes)?
(4) In the above code, I have used annotation="hgu95av2.db" (as used in the tutorial) simply because I'm not sure what this argument requires. Is this appropriate for the data as described above?

 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

[1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list