[BioC] LIMMA, FDR and B-statistic
Gordon Smyth
smyth at wehi.edu.au
Wed Mar 24 00:42:21 CET 2004
At 04:23 AM 24/03/2004, Jordi Altirriba Gutiérrez wrote:
>Hello to everyone!!
>I've been using RMA to normalize my data and LIMMA to obtain a list of
>significant genes. My design was a 2x2 factorial design with 4 groups:
>Diabetic treated, diabetic untreated, health treated and health untreated
>with 3 biological replicates in each group.
>I've got the list of significant genes with these commands (thanks again
>to Gordon!):
>>design<-model.matrix(~DIABETES*TREATMENT,data=pData(eset))
>>fit<-lmFit(eset,design)
>>contrast.matrix<-makeContrasts(DIABETESTRUE,TREATMENTTRUE,DIABETESTRUE.TREATMENTTRUE,levels=design)
>>fit2<-contrasts.fit(fit,contrast.matrix)
>>fit2<-eBayes(fit2)
>>topTable(fit2,
>>number=100,genelist=geneNames(eset),coef="DIABETESTRUE",adjust="fdr")
>Now I've more questions (sorry to bother you all again).
>1.- Is it possible to know at what false discovery rate are we working
>with these 100 genes? (something similar to the median and the 90th
>percentile of FDR that we obtain with SAM). If so, how can I get to know it ?
No it's not possible. Theoretically the 'fdr' method used by limma means
that the adjusted p-value for each gene gives an upper bound on the
expected false discovery rate if you were to cut on that value. (The theory
assume independence between genes which is certainly not true.) The true
false discovery rate is unknowable. My personal feeling is that you really
need a set of special purpose control probes, such as spike-ins, even to
get a decent unbiased estimator of the false discovery rate.
>2.- When I observe my genelist for the TREATMENT I realize that the first
>gene of the list has a negative B value (-2.83), however when I obtain the
>genelist for the TREATMENT.DIABETES, in this case what I get for the top
>gene is a B value of 14. Is it correct to interpret that the drug only
>acts in the diabetic animals and in the healthy ones does not induce any
>difference in the gene expression?
Yes, that seems to be what the data is saying.
>3.- When we work with the p-value, there is an agreement (more or less)
>that a value <0.05 is significant. Is there an agreement with the B
>statistic? (I've read "replicated microarray data" of Lönnstdt and Speed
>and I think that it depends on your data and experiment, but is there any
>way to determine the cutoff?)
I dont' think there is any such agreement in the microarray literature even
for p-values. Given all the assumptions, Terry and I are comfortable only
with using B-statistics and modified p-values to rank genes rather than as
absolute cut offs. I often use B>0, but only in an informal way. See
comments in:
Smyth, G. K. (2004). Linear models and empirical Bayes methods for
assessing differential expression in microarray experiments. Statistical
Applications in Genetics and Molecular Biology 3, No. 1, Article 3.
Smyth, G. K., Yang, Y.-H., Speed, T. P. (2003). Statistical issues in
microarray data analysis. In: Functional Genomics: Methods and Protocols,
M. J. Brownstein and A. B. Khodursky (eds.), Methods in Molecular Biology
Volume 224, Humana Press, Totowa, NJ, pages 111-136.
Gordon
>Thanks again for your suggestions and patience!
>Yours sincerely,
>
>Jordi Altirriba, PhD student
>IDIBAPS - Hospital Clinic (Barcelona, Spain)
More information about the Bioconductor
mailing list