[BioC] Understanding limma, fdr and topTable

Tue Jul 8 15:50:57 CEST 2008

The current wisdom seems to be that removing very low intensity genes 
from the dataset improves power.  This certainly makes sense, because 
there is definitely a detection limit.  I do not have a reference, 
but if you search the archived emails, you will find a lot of 
discussion about this.

That being said, you should only filter genes that are low in every 
sample.  Otherwise, you could filter out the most interesting genes - 
the ones that are expressed only under certain conditions.

--Naomi

At 08:27 AM 7/8/2008, Louisa A Rispoli/AS/EXP/UTIA wrote:
>Jim, Tom and Naomi and all-
>
>Thank you for your reponses. I am planning to look for genes (if any) that
>changed similaryly between the two groups depending on amplification
>procedure but was just getting starting looking at individually first. I
>appreciate the advice. Would pre-filtering to remove "absent" from all
>sample genes improve the situation or create a different problem, reading
>the archives there seems to be some disagreement if prefiltering before
>eBayes is appropriate. Thank you again I appreciate any help I get.
>
>Louisa
>
>
>  "If we knew what we were doing, it wouldn't be called Research." - Albert
>Einstein
>
>Louisa Rispoli, Ph.D. Reproductive Physiology
>Department of Animal Science
>University of Tennessee, Knoxville
>A105 Johnson Animal Research and Teaching Unit
>1750 Alcoa Highway
>Knoxville, TN 37920
>phone:(865) 946-1874
>fax:(865) 946-1010
>email: lrispoli at utk.edu
>
>
>
>
>
>              James MacDonald
>              <jmacdon at med.umic
>              h.edu>                                                     To
>              Sent by:                  Louisa A Rispoli/AS/EXP/UTIA
>              bioconductor-boun         <larispoli at mail.ag.utk.edu>
>              ces at stat.math.eth                                          cc
>              z.ch                      bioconductor at stat.math.ethz.ch
>                                                                    Subject
>                                        Re: [BioC] Understanding limma, fdr
>              07/07/2008 06:44          and topTable
>              PM
>
>
>
>
>
>
>
>
>
>Hi Louisa,
>
>Louisa A Rispoli/AS/EXP/UTIA wrote:
> > To all-
> >
> > I am a newbie trying to analyze microarray with minimal help and thought
> > that I had figured out all. We have a simple task of comparing two groups
> > with 8 replicates on the bovine genechip. I am attempting to understand
>the
> > results that I obtain using limma and adjusting for fdr. I have tried
> > reading the help vignettes on p.adjust and topTablle but no closer to
> > understanding if the adjusted p-value represents the fdr or the q-value
>or
> > something altogether different. Based on some recent work in another lab
> > (by abstract) I know that I may need to use a less stringent fdr then 5%
> > but I am unsure in limma how to change that value (or if it is feasible
>at
> > all). Looking at the results that I obtained so far, I do not have any
> > differentially expressed genes with the fdr adjustment. But I could be
> > interpreting that wrong, since I was looking for values of the adjusted p
> > to be lower then 0.05 and the smallest that I see is 0.7816, If someone
> > could look at this and assist, any help, advice or reprimmand would be
> > appreciated.
>
>The adjusted p-value does indeed represent the FDR, and it looks like
>you have little evidence for differential expression. However, this
>doesn't mean that there is no differential expression, just that you
>don't have much evidence to say there is.
>
>Given the data in hand, the table you show does give those probesets
>that appear to be the most consistently changed, and those are the most
>likely to validate if you were to choose to do so.
>
>Looking at the way you fit the model, I wonder what the end goal of the
>experiment might have been. When I see that sort of thing in the lab,
>almost always the client is looking for the interaction (e.g., genes
>that react differently to the treatment with HS depending on if they are
>wt or polyA). Is that not the case here?
>
>Regardless, I assume at the very least you might want to know the
>difference between the hs and control wt samples as well. If you fit a
>model that includes these samples and then compute the contrasts you
>might get better results (depending on the intra-group variability of
>the wt samples), as the denominator of your contrast will be based on
>all 32 samples rather than just the 16 you are using currently.
>
>Best,
>
>Jim
>
> >
> >
> > Thanks
> >
> > Louisa
> >
> > "If we knew what we were doing, it wouldn't be called Research." - Albert
> > Einstein
> >
> > Louisa Rispoli, Ph.D. Reproductive Physiology
> > Department of Animal Science
> > University of Tennessee, Knoxville
> > A105 Johnson Animal Research and Teaching Unit
> > 1750 Alcoa Highway
> > Knoxville, TN 37920
> > phone:(865) 946-1874
> > fax:(865) 946-1010
> > email: lrispoli at utk.edu
> >
> > R version 2.7.1 (2008-06-23)
> > Copyright (C) 2008 The R Foundation for Statistical Computing
> > ISBN 3-900051-07-0
> >> library(affycoretools)
> > Loading required package: affy
> > Loading required package: Biobase
> > Loading required package: tools
> >
> > Welcome to Bioconductor
> >
> >   Vignettes contain introductory material. To view, type
> >   'openVignette()'. To cite Bioconductor, see
> >   'citation("Biobase")' and for packages 'citation(pkgname)'.
> >
> > Loading required package: affyio
> > Loading required package: preprocessCore
> > Loading required package: limma
> > Loading required package: GOstats
> > Loading required package: graph
> > Loading required package: GO.db
> > Loading required package: AnnotationDbi
> > Loading required package: DBI
> > Loading required package: RSQLite
> > Loading required package: annotate
> > Loading required package: xtable
> > Loading required package: RBGL
> > Loading required package: Category
> > Loading required package: genefilter
> > Loading required package: survival
> > Loading required package: splines
> > Loading required package: biomaRt
> > Loading required package: RCurl
> >
> > Attaching package: 'biomaRt'
> >
> >
> >         The following object(s) are masked from package:annotate :
> >
> >          getGO
> >
> > Loading required package: gcrma
> > Loading required package: matchprobes
> > Loading required package: annaffy
> > Loading required package: KEGG.db
> >
> > Attaching package: 'annaffy'
> >
> >
> >         The following object(s) are masked from package:RCurl :
> >
> >          getURL
> >
> >> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE,
> > row.names=1)
> >> data <- ReadAffy(phenoData=pd)
> >> pData(data)
> >                amp  trt
> > PolyC-1.CEL  PolyA Ctrl
> > PolyC-2.CEL  PolyA Ctrl
> > PolyC-3.CEL  PolyA Ctrl
> > PolyC-4.CEL  PolyA Ctrl
> > PolyC-5.CEL  PolyA Ctrl
> > PolyC-6.CEL  PolyA Ctrl
> > PolyC-7.CEL  PolyA Ctrl
> > PolyC-8.CEL  PolyA Ctrl
> > PolyHS-1.CEL PolyA   HS
> > PolyHS-2.CEL PolyA   HS
> > PolyHS-3.CEL PolyA   HS
> > PolyHS-4.CEL PolyA   HS
> > PolyHS-5.CEL PolyA   HS
> > PolyHS-6.CEL PolyA   HS
> > PolyHS-7.CEL PolyA   HS
> > PolyHS-8.CEL PolyA   HS
> > WTC-1.CEL       WT Ctrl
> > WTC-2.CEL       WT Ctrl
> > WTC-3.CEL       WT Ctrl
> > WTC-4.CEL       WT Ctrl
> > WTC-5.CEL       WT Ctrl
> > WTC-6.CEL       WT Ctrl
> > WTC-7.CEL       WT Ctrl
> > WTC-8.CEL       WT Ctrl
> > WTHS-1.CEL      WT   HS
> > WTHS-2.CEL      WT   HS
> > WTHS-3.CEL      WT   HS
> > WTHS-4.CEL      WT   HS
> > WTHS-5.CEL      WT   HS
> > WTHS-6.CEL      WT   HS
> > WTHS-7.CEL      WT   HS
> > WTHS-8.CEL      WT   HS
> >> Poly.rma <- rma(data[,1:16])
> > Background correcting
> > Normalizing
> > Calculating Expression
> >> pData(Poly.rma)
> >                amp  trt
> > PolyC-1.CEL  PolyA Ctrl
> > PolyC-2.CEL  PolyA Ctrl
> > PolyC-3.CEL  PolyA Ctrl
> > PolyC-4.CEL  PolyA Ctrl
> > PolyC-5.CEL  PolyA Ctrl
> > PolyC-6.CEL  PolyA Ctrl
> > PolyC-7.CEL  PolyA Ctrl
> > PolyC-8.CEL  PolyA Ctrl
> > PolyHS-1.CEL PolyA   HS
> > PolyHS-2.CEL PolyA   HS
> > PolyHS-3.CEL PolyA   HS
> > PolyHS-4.CEL PolyA   HS
> > PolyHS-5.CEL PolyA   HS
> > PolyHS-6.CEL PolyA   HS
> > PolyHS-7.CEL PolyA   HS
> > PolyHS-8.CEL PolyA   HS
> >> treatment <-c("C",
> > "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS")
> >> design <- model.matrix(~factor(treatment))
> >> colnames(design) <- c("Ctrl", "CvsHS")
> >> design
> >    Ctrl CvsHS
> > 1     1     0
> > 2     1     0
> > 3     1     0
> > 4     1     0
> > 5     1     0
> > 6     1     0
> > 7     1     0
> > 8     1     0
> > 9     1     1
> > 10    1     1
> > 11    1     1
> > 12    1     1
> > 13    1     1
> > 14    1     1
> > 15    1     1
> > 16    1     1
> > attr(,"assign")
> > [1] 0 1
> > attr(,"contrasts")
> > attr(,"contrasts")$`factor(treatment)`
> > [1] "contr.treatment"
> >> options(digits=4)
> >> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr")
> >                       ID   logFC AveExpr      t   P.Value adj.P.Val
>B
> > 16088   Bt.27852.2.S1_at -0.4764   3.621 -5.274 5.709e-05    0.7816
>-1.722
> > 12937   Bt.24859.1.A1_at  0.6632   6.940  5.152 7.374e-05    0.7816
>-1.790
> > 2853    Bt.13563.2.S1_at -0.5288   6.747 -4.703 1.922e-04    0.7816
>-2.061
> > 2474    Bt.13162.1.S1_at -0.6424   5.964 -4.574 2.534e-04    0.7816
>-2.142
> > 10402   Bt.22107.1.S1_at -0.3261  10.358 -4.475 3.143e-04    0.7816
>-2.207
> > 10654   Bt.22355.1.S1_at -0.3533   8.650 -4.316 4.439e-04    0.7816
>-2.312
> > 8883    Bt.20584.1.S1_at -0.3671   8.106 -4.225 5.417e-04    0.7816
>-2.374
> > 2851    Bt.13562.1.S1_at  0.6901   8.865  4.190 5.850e-04    0.7816
>-2.399
> > 5772     Bt.1754.1.S1_at -0.5986   9.348 -4.187 5.885e-04    0.7816
>-2.400
> > 19715    Bt.4440.1.A1_at -0.6881   2.526 -4.160 6.242e-04    0.7816
>-2.419
> > 3163    Bt.13933.1.S1_at  0.4519   7.798  4.138 6.549e-04    0.7816
>-2.434
> > 14921   Bt.26765.1.S1_at -0.3673   7.846 -4.110 6.962e-04    0.7816
>-2.454
> > 10751   Bt.22445.1.S1_at -0.3222   8.622 -4.108 7.002e-04    0.7816
>-2.456
> > 2906    Bt.13629.1.A1_at  0.2830   2.294  4.095 7.196e-04    0.7816
>-2.464
> > 5721     Bt.1749.1.S1_at  0.6030   3.839  4.072 7.578e-04    0.7816
>-2.481
> > 21192    Bt.6078.1.S1_at -0.4385   8.564 -4.054 7.879e-04    0.7816
>-2.493
> > 800     Bt.11145.1.S1_at -0.5060   2.461 -4.030 8.312e-04    0.7816
>-2.511
> > 27    AFFX-Bt-ECOLOXL_at  0.3410   1.290  4.022 8.443e-04    0.7816
>-2.516
> > 22515    Bt.7980.1.S1_at -0.3734   7.622 -3.983 9.196e-04    0.7816
>-2.543
> > 4609    Bt.16378.1.A1_at -0.3775   5.071 -3.964 9.607e-04    0.7816
>-2.557
> > 13765   Bt.25669.1.S1_at -0.5523   8.197 -3.948 9.933e-04    0.7816
>-2.568
> > 12023   Bt.24033.1.A1_at -0.4962   6.256 -3.917 1.065e-03    0.7816
>-2.591
> > 20843    Bt.5578.1.S1_at -0.4133   7.904 -3.913 1.073e-03    0.7816
>-2.594
> > 17021   Bt.28620.1.S1_at  0.4720   5.181  3.899 1.106e-03    0.7816
>-2.603
> > 23922    Bt.9791.1.S1_at -0.3420   9.963 -3.893 1.122e-03    0.7816
>-2.608
> >
> >> sessionInfo()
> > R version 2.7.1 (2008-06-23)
> > i386-pc-mingw32
> >
> > locale:
> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> > States.1252;LC_MONETARY=English_United
> >
> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] splines   tools     stats     graphics  grDevices utils     datasets
> > methods   base
> >
> > other attached packages:
> >  [1] bovinecdf_2.2.0      affycoretools_1.12.0 annaffy_1.12.1
> > KEGG.db_2.2.0        gcrma_2.12.1         matchprobes_1.12.0
> >  [7] biomaRt_1.14.0       RCurl_0.9-3          GOstats_2.6.0
> > Category_2.6.0       genefilter_1.20.0    survival_2.34-1
> > [13] RBGL_1.16.0          annotate_1.18.0      xtable_1.5-2
> > GO.db_2.2.0          AnnotationDbi_1.2.2  RSQLite_0.6-9
> > [19] DBI_0.2-4            graph_1.18.1         limma_2.14.5
> > affy_1.18.2          preprocessCore_1.2.0 affyio_1.8.0
> > [25] Biobase_2.0.1
> >
> > loaded via a namespace (and not attached):
> > [1] cluster_1.11.11 XML_1.95-3
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>--
>James W. MacDonald, MS
>Biostatistician
>UMCCC cDNA and Affymetrix Core
>University of Michigan
>1500 E Medical Center Drive
>7410 CCGC
>Ann Arbor MI 48109
>734-647-5623
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111