[BioC] Understanding limma, fdr and topTable

Naomi Altman naomi at stat.psu.edu
Tue Jul 8 04:49:46 CEST 2008


Dear Louisa,

FDR works well to "adjust" p-values when the distribution of 
unadjusted p-values has a sharp spike near 0 and is otherwise pretty flat.
What often happens if either the data are noisy or the fold changes 
are relatively small is that the unadjusted p-values are elevated 
near 0 and taper almost linearly into the flat part near p=1.  In 
that case, you will have evidence for differential expression, but 
any list of genes you produce will have a high percentage of false 
detections (approximately 78.16% in your case).
As James said, it is still the case that the smallest p-values belong 
to genes for which you have the most evidence of differential 
expression.  So those are the ones to try to validate.

The fits from your ebayes step include the p-values for each 
contrast.  e.g. output$p.value.  To get a histogram for the p-values 
for the first contrast

hist(output$p.value[,1])

Or, you can obtain the histogram for the p-value for the overall 
F-test output$F.p.value.

Naomi

At 04:35 PM 7/7/2008, Louisa A Rispoli/AS/EXP/UTIA wrote:

>To all-
>
>I am a newbie trying to analyze microarray with minimal help and thought
>that I had figured out all. We have a simple task of comparing two groups
>with 8 replicates on the bovine genechip. I am attempting to understand the
>results that I obtain using limma and adjusting for fdr. I have tried
>reading the help vignettes on p.adjust and topTablle but no closer to
>understanding if the adjusted p-value represents the fdr or the q-value or
>something altogether different. Based on some recent work in another lab
>(by abstract) I know that I may need to use a less stringent fdr then 5%
>but I am unsure in limma how to change that value (or if it is feasible at
>all). Looking at the results that I obtained so far, I do not have any
>differentially expressed genes with the fdr adjustment. But I could be
>interpreting that wrong, since I was looking for values of the adjusted p
>to be lower then 0.05 and the smallest that I see is 0.7816, If someone
>could look at this and assist, any help, advice or reprimmand would be
>appreciated.
>
>
>Thanks
>
>Louisa
>
>"If we knew what we were doing, it wouldn't be called Research." - Albert
>Einstein
>
>Louisa Rispoli, Ph.D. Reproductive Physiology
>Department of Animal Science
>University of Tennessee, Knoxville
>A105 Johnson Animal Research and Teaching Unit
>1750 Alcoa Highway
>Knoxville, TN 37920
>phone:(865) 946-1874
>fax:(865) 946-1010
>email: lrispoli at utk.edu
>
>R version 2.7.1 (2008-06-23)
>Copyright (C) 2008 The R Foundation for Statistical Computingmja
>ISBN 3-900051-07-0
> > library(affycoretools)
>Loading required package: affy
>Loading required package: Biobase
>Loading required package: tools
>
>Welcome to Bioconductor
>
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>Loading required package: affyio
>Loading required package: preprocessCore
>Loading required package: limma
>Loading required package: GOstats
>Loading required package: graph
>Loading required package: GO.db
>Loading required package: AnnotationDbi
>Loading required package: DBI
>Loading required package: RSQLite
>Loading required package: annotate
>Loading required package: xtable
>Loading required package: RBGL
>Loading required package: Category
>Loading required package: genefilter
>Loading required package: survival
>Loading required package: splines
>Loading required package: biomaRt
>Loading required package: RCurl
>
>Attaching package: 'biomaRt'
>
>
>         The following object(s) are masked from package:annotate :
>
>          getGO
>
>Loading required package: gcrma
>Loading required package: matchprobes
>Loading required package: annaffy
>Loading required package: KEGG.db
>
>Attaching package: 'annaffy'
>
>
>         The following object(s) are masked from package:RCurl :
>
>          getURL
>
> > pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE,
>row.names=1)
> > data <- ReadAffy(phenoData=pd)
> > pData(data)
>                amp  trt
>PolyC-1.CEL  PolyA Ctrl
>PolyC-2.CEL  PolyA Ctrl
>PolyC-3.CEL  PolyA Ctrl
>PolyC-4.CEL  PolyA Ctrl
>PolyC-5.CEL  PolyA Ctrl
>PolyC-6.CEL  PolyA Ctrl
>PolyC-7.CEL  PolyA Ctrl
>PolyC-8.CEL  PolyA Ctrl
>PolyHS-1.CEL PolyA   HS
>PolyHS-2.CEL PolyA   HS
>PolyHS-3.CEL PolyA   HS
>PolyHS-4.CEL PolyA   HS
>PolyHS-5.CEL PolyA   HS
>PolyHS-6.CEL PolyA   HS
>PolyHS-7.CEL PolyA   HS
>PolyHS-8.CEL PolyA   HS
>WTC-1.CEL       WT Ctrl
>WTC-2.CEL       WT Ctrl
>WTC-3.CEL       WT Ctrl
>WTC-4.CEL       WT Ctrl
>WTC-5.CEL       WT Ctrl
>WTC-6.CEL       WT Ctrl
>WTC-7.CEL       WT Ctrl
>WTC-8.CEL       WT Ctrl
>WTHS-1.CEL      WT   HS
>WTHS-2.CEL      WT   HS
>WTHS-3.CEL      WT   HS
>WTHS-4.CEL      WT   HS
>WTHS-5.CEL      WT   HS
>WTHS-6.CEL      WT   HS
>WTHS-7.CEL      WT   HS
>WTHS-8.CEL      WT   HS
> > Poly.rma <- rma(data[,1:16])
>Background correcting
>Normalizing
>Calculating Expression
> > pData(Poly.rma)
>                amp  trt
>PolyC-1.CEL  PolyA Ctrl
>PolyC-2.CEL  PolyA Ctrl
>PolyC-3.CEL  PolyA Ctrl
>PolyC-4.CEL  PolyA Ctrl
>PolyC-5.CEL  PolyA Ctrl
>PolyC-6.CEL  PolyA Ctrl
>PolyC-7.CEL  PolyA Ctrl
>PolyC-8.CEL  PolyA Ctrl
>PolyHS-1.CEL PolyA   HS
>PolyHS-2.CEL PolyA   HS
>PolyHS-3.CEL PolyA   HS
>PolyHS-4.CEL PolyA   HS
>PolyHS-5.CEL PolyA   HS
>PolyHS-6.CEL PolyA   HS
>PolyHS-7.CEL PolyA   HS
>PolyHS-8.CEL PolyA   HS
> > treatment <-c("C",
>"C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS")
> > design <- model.matrix(~factor(treatment))
> > colnames(design) <- c("Ctrl", "CvsHS")
> > design
>    Ctrl CvsHS
>1     1     0
>2     1     0
>3     1     0
>4     1     0
>5     1     0
>6     1     0
>7     1     0
>8     1     0
>9     1     1
>10    1     1
>11    1     1
>12    1     1
>13    1     1
>14    1     1
>15    1     1
>16    1     1
>attr(,"assign")
>[1] 0 1
>attr(,"contrasts")
>attr(,"contrasts")$`factor(treatment)`
>[1] "contr.treatment"
> > options(digits=4)
> > topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr")
>                       ID   logFC AveExpr      t   P.Value adj.P.Val      B
>16088   Bt.27852.2.S1_at -0.4764   3.621 -5.274 5.709e-05    0.7816 -1.722
>12937   Bt.24859.1.A1_at  0.6632   6.940  5.152 7.374e-05    0.7816 -1.790
>2853    Bt.13563.2.S1_at -0.5288   6.747 -4.703 1.922e-04    0.7816 -2.061
>2474    Bt.13162.1.S1_at -0.6424   5.964 -4.574 2.534e-04    0.7816 -2.142
>10402   Bt.22107.1.S1_at -0.3261  10.358 -4.475 3.143e-04    0.7816 -2.207
>10654   Bt.22355.1.S1_at -0.3533   8.650 -4.316 4.439e-04    0.7816 -2.312
>8883    Bt.20584.1.S1_at -0.3671   8.106 -4.225 5.417e-04    0.7816 -2.374
>2851    Bt.13562.1.S1_at  0.6901   8.865  4.190 5.850e-04    0.7816 -2.399
>5772     Bt.1754.1.S1_at -0.5986   9.348 -4.187 5.885e-04    0.7816 -2.400
>19715    Bt.4440.1.A1_at -0.6881   2.526 -4.160 6.242e-04    0.7816 -2.419
>3163    Bt.13933.1.S1_at  0.4519   7.798  4.138 6.549e-04    0.7816 -2.434
>14921   Bt.26765.1.S1_at -0.3673   7.846 -4.110 6.962e-04    0.7816 -2.454
>10751   Bt.22445.1.S1_at -0.3222   8.622 -4.108 7.002e-04    0.7816 -2.456
>2906    Bt.13629.1.A1_at  0.2830   2.294  4.095 7.196e-04    0.7816 -2.464
>5721     Bt.1749.1.S1_at  0.6030   3.839  4.072 7.578e-04    0.7816 -2.481
>21192    Bt.6078.1.S1_at -0.4385   8.564 -4.054 7.879e-04    0.7816 -2.493
>800     Bt.11145.1.S1_at -0.5060   2.461 -4.030 8.312e-04    0.7816 -2.511
>27    AFFX-Bt-ECOLOXL_at  0.3410   1.290  4.022 8.443e-04    0.7816 -2.516
>22515    Bt.7980.1.S1_at -0.3734   7.622 -3.983 9.196e-04    0.7816 -2.543
>4609    Bt.16378.1.A1_at -0.3775   5.071 -3.964 9.607e-04    0.7816 -2.557
>13765   Bt.25669.1.S1_at -0.5523   8.197 -3.948 9.933e-04    0.7816 -2.568
>12023   Bt.24033.1.A1_at -0.4962   6.256 -3.917 1.065e-03    0.7816 -2.591
>20843    Bt.5578.1.S1_at -0.4133   7.904 -3.913 1.073e-03    0.7816 -2.594
>17021   Bt.28620.1.S1_at  0.4720   5.181  3.899 1.106e-03    0.7816 -2.603
>23922    Bt.9791.1.S1_at -0.3420   9.963 -3.893 1.122e-03    0.7816 -2.608
>
> > sessionInfo()
>R version 2.7.1 (2008-06-23)
>i386-pc-mingw32
>
>locale:
>LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>States.1252;LC_MONETARY=English_United
>
>States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
>attached base packages:
>[1] splines   tools     stats     graphics  grDevices utils     datasets
>methods   base
>
>other attached packages:
>  [1] bovinecdf_2.2.0      affycoretools_1.12.0 annaffy_1.12.1
>KEGG.db_2.2.0        gcrma_2.12.1         matchprobes_1.12.0
>  [7] biomaRt_1.14.0       RCurl_0.9-3          GOstats_2.6.0
>Category_2.6.0       genefilter_1.20.0    survival_2.34-1
>[13] RBGL_1.16.0          annotate_1.18.0      xtable_1.5-2
>GO.db_2.2.0          AnnotationDbi_1.2.2  RSQLite_0.6-9
>[19] DBI_0.2-4            graph_1.18.1         limma_2.14.5
>affy_1.18.2          preprocessCore_1.2.0 affyio_1.8.0
>[25] Biobase_2.0.1
>
>loaded via a namespace (and not attached):
>[1] cluster_1.11.11 XML_1.95-3
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list