[BioC] LIMMA, FDR and B-statistic

Wed Mar 24 00:42:21 CET 2004

At 04:23 AM 24/03/2004, Jordi Altirriba Gutiérrez wrote:
>Hello to everyone!!
>I've been using RMA to normalize my data and LIMMA to obtain a list of 
>significant genes. My design was a 2x2 factorial design with 4 groups: 
>Diabetic treated, diabetic untreated, health treated and health untreated 
>with 3 biological replicates in each group.
>I've got the list of significant genes with these commands (thanks again 
>to Gordon!):
>>design<-model.matrix(~DIABETES*TREATMENT,data=pData(eset))
>>fit<-lmFit(eset,design)
>>contrast.matrix<-makeContrasts(DIABETESTRUE,TREATMENTTRUE,DIABETESTRUE.TREATMENTTRUE,levels=design)
>>fit2<-contrasts.fit(fit,contrast.matrix)
>>fit2<-eBayes(fit2)
>>topTable(fit2, 
>>number=100,genelist=geneNames(eset),coef="DIABETESTRUE",adjust="fdr")
>Now I've more questions (sorry to bother you all again).
>1.- Is it possible to know at what false discovery rate are we working 
>with these 100 genes? (something similar to the median and the 90th 
>percentile of FDR that we obtain with SAM). If so, how can I get to know it ?

No it's not possible. Theoretically the 'fdr' method used by limma means 
that the adjusted p-value for each gene gives an upper bound on the 
expected false discovery rate if you were to cut on that value. (The theory 
assume independence between genes which is certainly not true.) The true 
false discovery rate is unknowable. My personal feeling is that you really 
need a set of special purpose control probes, such as spike-ins, even to 
get a decent unbiased estimator of the false discovery rate.

>2.- When I observe my genelist for the TREATMENT I realize that the first 
>gene of the list has a negative B value (-2.83), however when I obtain the 
>genelist for the TREATMENT.DIABETES, in this case what I get for the top 
>gene is a B value of 14. Is it correct to interpret that the drug only 
>acts in the diabetic animals and in the healthy ones does not induce any 
>difference in the gene expression?

Yes, that seems to be what the data is saying.

>3.- When we work with the p-value, there is an agreement (more or less) 
>that a value <0.05 is significant. Is there an agreement with the B 
>statistic? (I've read "replicated microarray data" of Lönnstdt and Speed 
>and I think that it depends on your data and experiment, but is there any 
>way to determine the cutoff?)

I dont' think there is any such agreement in the microarray literature even 
for p-values. Given all the assumptions, Terry and I are comfortable only 
with using B-statistics and modified p-values to rank genes rather than as 
absolute cut offs. I often use B>0, but only in an informal way. See 
comments in:

Smyth, G. K. (2004). Linear models and empirical Bayes methods for 
assessing differential expression in microarray experiments. Statistical 
Applications in Genetics and Molecular Biology 3, No. 1, Article 3.

Smyth, G. K., Yang, Y.-H., Speed, T. P. (2003). Statistical issues in 
microarray data analysis. In: Functional Genomics: Methods and Protocols, 
M. J. Brownstein and A. B. Khodursky (eds.), Methods in Molecular Biology 
Volume 224, Humana Press, Totowa, NJ, pages 111-136.

Gordon

>Thanks again for your suggestions and patience!
>Yours sincerely,
>
>Jordi Altirriba, PhD student
>IDIBAPS - Hospital Clinic (Barcelona, Spain)