[BioC] which universe in hyperGTest

James W. MacDonald jmacdon at med.umich.edu
Thu Jan 11 20:37:57 CET 2007


marco zucchelli wrote:
> Sorry, I missed to reply to bioconductor ...

Thanks for re-sending to the list.

> 
> Dear James,
> 
> actually I tried both the options and got the same result, which probably
> menas that the hyperGTest is taking care of the repeats.

True. It does that now. In addition, in the new version it does some 
validity checking to ensure that e.g., the geneId is a subset of the 
universeGeneIds.

> 
> I use hgu133plus2.
> 
> The pre-filtering is pretty interesting argument. In my case I have 2
> biological replicates for each tissue and I split the probes based on the
> following rule:

Here I am assuming that you mean probesets when you say probes.

> 
> 1.probes that are called absent on all the arrays (group A)
> 2.probes that have consistent calls on the biological replicates (i.e. they
> are either present or moderate or absent on both the replicates) but at
> least once they are present on a pair of replicates (group B)
> 3. others. these are probes that may be absent on one replicate and present
> on the other etc. (group C)
> 
> now I find differentially expressed gened with limma in group B and I
> cluster them according to the pattern of up-down regulation they have on 
> the
> different tissues.
> 
> I am intrested also in genes that are always absent (group A) since this 
> may
> be biologically of interest in my experiment.
> 
> What is now a good universe to use?
> 
> I filtered out probes that have no entrezID both in my gene lists and the
> hgu133plus2 array, I got the entrezId's and eliminated the doubles by using
> unique().
> 
> Then I used:
> 
> 1. geneIds = A                  , universeGenes = ALL  but seems like I
> should use universeGenes = A+B  ??

For this group, the statistic you used to select 'significant' probesets 
was the P/A calls, and you selected from all the genes on the chip, so I 
think your universe is correct.

> 2. geneIds = cluster of B    , universeGenes = ALL  but seems like I should
> use universeGenes = A+B  ?? or only B?

Here you pre-filtered the probesets and selected only those that 
fulfilled your 'B' criterion. You then used a linear model to select 
differentially expressed genes from this subset of the chip, so IMO, 
your universe is the 'B' genes.


> 
> Does it make sense to look at unexpressed genes ? seems like this might be
> not very wise on the affy arrays.

I think you might get ten different answers from ten different 
statisticians if you asked them about what it means when a probeset has 
an 'absent' call.

I am personally not convinced that the P/M/A calls mean much, mainly 
because a large percentage (30 - 40%) of the MM probes are brighter than 
their matching PM probe. It seems to me that the MM probes capture an 
unknown mixture of noise, transcript abundance, background binding, and 
binding of unexpected transcript (e.g., an unrelated transcript for 
which the MM probe is a PM that Affy didn't know about).

The assumption for the P/M/A calls is that the MM probe only captures 
noise and background binding, so if the PM probes are significantly 
brighter, then the transcript is expressed. In many cases this may well 
be true. However, I think there are likely enough cases where this isn't 
true that I am not comfortable assuming that an absent call means the 
gene really isn't being expressed.

That said, by looking at the 'unexpressed' genes are you really trying 
to figure out which pathways (for lack of a better term) are being shut 
down, or not used? If that is the case, then maybe you could use the B 
genes and look for _underrepresented_ GO terms as well. This might give 
you what you want without having to make any assumptions about which 
genes are not being expressed.

You can test for under-represented GO terms by setting the testDirection 
argument of your GOHyperGParams object to "under". So if your 
GOHyperGParams object were set up like this:

params <- new("GOHyperGParams", geneIds = A, geneUniverseIds = ALL)

then you would test for over representation as normal:

hyperGTest(p)

and under-representation like this:

testDirection(params) <- "under"
hyperGTest(p)

HTH,

Jim

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list