[BioC] FW: Limma p-values, fdr and classifyTests

Mon Aug 23 15:53:54 CEST 2004

Sorry I'd been away and missed some posts - it seems limma - fdr is a
hot topic at the moment.

To link this to an answer already provided by gordon - this is the
thread I found earlier -
https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-August/005616.
html

This addresses possibilities to fdr correct on the gene and contrast
level, however I was wondering if anyone can confirm that the fdr in
toptable is on the gene level. If so can the vector of p-values be
passed back to classifytests to put the 1 / -1's in for those up / down
reg? Also is there any difference in doing contrasts then genes vs. the
reverse?

I'd also be interested in the discussion of how quantitive limma
p-values are (see point 4 in previous mail below).

On these lines - from the abstract of Smyth. LM and eBayes methods...
"The eBayes approach is equilivent to shrinkage of the estimated sample
variances towards a pooled estimate.." I assume (as it works with low
#'s of arrays) that the pooled estimate is between genes rather than
arrays? If so then what about pooling between arrays - eg: when you have
10 lines exposed to a common treatment and say 3 reps. So 10x2 x3reps.
Using the pooled estimate across arrays (30reps control vs. 30 reps
treated) to then apply to differences between lines due to the treatment
(only 3 vs. 3 arrays)? Obviously the lines would have to be generally
similar. But wouldn't this be more biologically relevent than assuming
similar expressed genes have similar variance?

Thanks in advance,
Matt

-----Original Message-----
From: Matthew Hannah 
Sent: Donnerstag, 19. August 2004 11:53
To: 'bioconductor at stat.math.ethz.ch'
Subject: Limma p-values, fdr and classifyTests

Hi,

I'm using Limma and have some questions related to p-values and gene
selection. 

Looking in the classifyTests help I noticed "The adjustment for multiple
testing is across the contrasts rather than the more usual control
across genes." There is also a multiple testing procedure for the
topTable function but this appears to give a different result (<sig.
genes) - is this the more usual control across genes? Why are they
different? Is it possible to take both into account?

Basically I'm not just interested in the top 50 genes, I'd like to
identify all 'significant' changes. I thought the output from
classifyTestsP (0.01, fdr) would be good but this doesn't account for
across gene multiple testing. Is there an easy way to get this output
rather than calling topTable (if the fdr is across genes?) for all
genes?

classifyTestsF could be useful as I'm looking at a treatment effect on
different lines. However, again there is no account of across gene
multiple testing. Is there any possibility to do this?

Also, all this talk of p-values but there is a note saying they are
nominal. How far does this hold true - do you always have to select a
cut-off based on some criteria (eg:control genes) or is there a way they
can be applied quantitatively?

Finally is it ok to pass an eBayes fit to topTable? What's the
difference compared to toptable?
fit <- lmFit(esetgcrma, design)
con.fit <- contrasts.fit(fit, cont.matrix) ebfit <- eBayes(con.fit)
topTable(ebfit,coef=1,number=50,adjust="fdr")

Thanks alot,

Matt