[BioC] Understanding limma, fdr and topTable
Naomi Altman
naomi at stat.psu.edu
Tue Jul 8 15:50:57 CEST 2008
The current wisdom seems to be that removing very low intensity genes
from the dataset improves power. This certainly makes sense, because
there is definitely a detection limit. I do not have a reference,
but if you search the archived emails, you will find a lot of
discussion about this.
That being said, you should only filter genes that are low in every
sample. Otherwise, you could filter out the most interesting genes -
the ones that are expressed only under certain conditions.
--Naomi
At 08:27 AM 7/8/2008, Louisa A Rispoli/AS/EXP/UTIA wrote:
>Jim, Tom and Naomi and all-
>
>Thank you for your reponses. I am planning to look for genes (if any) that
>changed similaryly between the two groups depending on amplification
>procedure but was just getting starting looking at individually first. I
>appreciate the advice. Would pre-filtering to remove "absent" from all
>sample genes improve the situation or create a different problem, reading
>the archives there seems to be some disagreement if prefiltering before
>eBayes is appropriate. Thank you again I appreciate any help I get.
>
>Louisa
>
>
> "If we knew what we were doing, it wouldn't be called Research." - Albert
>Einstein
>
>Louisa Rispoli, Ph.D. Reproductive Physiology
>Department of Animal Science
>University of Tennessee, Knoxville
>A105 Johnson Animal Research and Teaching Unit
>1750 Alcoa Highway
>Knoxville, TN 37920
>phone:(865) 946-1874
>fax:(865) 946-1010
>email: lrispoli at utk.edu
>
>
>
>
>
> James MacDonald
> <jmacdon at med.umic
> h.edu> To
> Sent by: Louisa A Rispoli/AS/EXP/UTIA
> bioconductor-boun <larispoli at mail.ag.utk.edu>
> ces at stat.math.eth cc
> z.ch bioconductor at stat.math.ethz.ch
> Subject
> Re: [BioC] Understanding limma, fdr
> 07/07/2008 06:44 and topTable
> PM
>
>
>
>
>
>
>
>
>
>Hi Louisa,
>
>Louisa A Rispoli/AS/EXP/UTIA wrote:
> > To all-
> >
> > I am a newbie trying to analyze microarray with minimal help and thought
> > that I had figured out all. We have a simple task of comparing two groups
> > with 8 replicates on the bovine genechip. I am attempting to understand
>the
> > results that I obtain using limma and adjusting for fdr. I have tried
> > reading the help vignettes on p.adjust and topTablle but no closer to
> > understanding if the adjusted p-value represents the fdr or the q-value
>or
> > something altogether different. Based on some recent work in another lab
> > (by abstract) I know that I may need to use a less stringent fdr then 5%
> > but I am unsure in limma how to change that value (or if it is feasible
>at
> > all). Looking at the results that I obtained so far, I do not have any
> > differentially expressed genes with the fdr adjustment. But I could be
> > interpreting that wrong, since I was looking for values of the adjusted p
> > to be lower then 0.05 and the smallest that I see is 0.7816, If someone
> > could look at this and assist, any help, advice or reprimmand would be
> > appreciated.
>
>The adjusted p-value does indeed represent the FDR, and it looks like
>you have little evidence for differential expression. However, this
>doesn't mean that there is no differential expression, just that you
>don't have much evidence to say there is.
>
>Given the data in hand, the table you show does give those probesets
>that appear to be the most consistently changed, and those are the most
>likely to validate if you were to choose to do so.
>
>Looking at the way you fit the model, I wonder what the end goal of the
>experiment might have been. When I see that sort of thing in the lab,
>almost always the client is looking for the interaction (e.g., genes
>that react differently to the treatment with HS depending on if they are
>wt or polyA). Is that not the case here?
>
>Regardless, I assume at the very least you might want to know the
>difference between the hs and control wt samples as well. If you fit a
>model that includes these samples and then compute the contrasts you
>might get better results (depending on the intra-group variability of
>the wt samples), as the denominator of your contrast will be based on
>all 32 samples rather than just the 16 you are using currently.
>
>Best,
>
>Jim
>
> >
> >
> > Thanks
> >
> > Louisa
> >
> > "If we knew what we were doing, it wouldn't be called Research." - Albert
> > Einstein
> >
> > Louisa Rispoli, Ph.D. Reproductive Physiology
> > Department of Animal Science
> > University of Tennessee, Knoxville
> > A105 Johnson Animal Research and Teaching Unit
> > 1750 Alcoa Highway
> > Knoxville, TN 37920
> > phone:(865) 946-1874
> > fax:(865) 946-1010
> > email: lrispoli at utk.edu
> >
> > R version 2.7.1 (2008-06-23)
> > Copyright (C) 2008 The R Foundation for Statistical Computing
> > ISBN 3-900051-07-0
> >> library(affycoretools)
> > Loading required package: affy
> > Loading required package: Biobase
> > Loading required package: tools
> >
> > Welcome to Bioconductor
> >
> > Vignettes contain introductory material. To view, type
> > 'openVignette()'. To cite Bioconductor, see
> > 'citation("Biobase")' and for packages 'citation(pkgname)'.
> >
> > Loading required package: affyio
> > Loading required package: preprocessCore
> > Loading required package: limma
> > Loading required package: GOstats
> > Loading required package: graph
> > Loading required package: GO.db
> > Loading required package: AnnotationDbi
> > Loading required package: DBI
> > Loading required package: RSQLite
> > Loading required package: annotate
> > Loading required package: xtable
> > Loading required package: RBGL
> > Loading required package: Category
> > Loading required package: genefilter
> > Loading required package: survival
> > Loading required package: splines
> > Loading required package: biomaRt
> > Loading required package: RCurl
> >
> > Attaching package: 'biomaRt'
> >
> >
> > The following object(s) are masked from package:annotate :
> >
> > getGO
> >
> > Loading required package: gcrma
> > Loading required package: matchprobes
> > Loading required package: annaffy
> > Loading required package: KEGG.db
> >
> > Attaching package: 'annaffy'
> >
> >
> > The following object(s) are masked from package:RCurl :
> >
> > getURL
> >
> >> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE,
> > row.names=1)
> >> data <- ReadAffy(phenoData=pd)
> >> pData(data)
> > amp trt
> > PolyC-1.CEL PolyA Ctrl
> > PolyC-2.CEL PolyA Ctrl
> > PolyC-3.CEL PolyA Ctrl
> > PolyC-4.CEL PolyA Ctrl
> > PolyC-5.CEL PolyA Ctrl
> > PolyC-6.CEL PolyA Ctrl
> > PolyC-7.CEL PolyA Ctrl
> > PolyC-8.CEL PolyA Ctrl
> > PolyHS-1.CEL PolyA HS
> > PolyHS-2.CEL PolyA HS
> > PolyHS-3.CEL PolyA HS
> > PolyHS-4.CEL PolyA HS
> > PolyHS-5.CEL PolyA HS
> > PolyHS-6.CEL PolyA HS
> > PolyHS-7.CEL PolyA HS
> > PolyHS-8.CEL PolyA HS
> > WTC-1.CEL WT Ctrl
> > WTC-2.CEL WT Ctrl
> > WTC-3.CEL WT Ctrl
> > WTC-4.CEL WT Ctrl
> > WTC-5.CEL WT Ctrl
> > WTC-6.CEL WT Ctrl
> > WTC-7.CEL WT Ctrl
> > WTC-8.CEL WT Ctrl
> > WTHS-1.CEL WT HS
> > WTHS-2.CEL WT HS
> > WTHS-3.CEL WT HS
> > WTHS-4.CEL WT HS
> > WTHS-5.CEL WT HS
> > WTHS-6.CEL WT HS
> > WTHS-7.CEL WT HS
> > WTHS-8.CEL WT HS
> >> Poly.rma <- rma(data[,1:16])
> > Background correcting
> > Normalizing
> > Calculating Expression
> >> pData(Poly.rma)
> > amp trt
> > PolyC-1.CEL PolyA Ctrl
> > PolyC-2.CEL PolyA Ctrl
> > PolyC-3.CEL PolyA Ctrl
> > PolyC-4.CEL PolyA Ctrl
> > PolyC-5.CEL PolyA Ctrl
> > PolyC-6.CEL PolyA Ctrl
> > PolyC-7.CEL PolyA Ctrl
> > PolyC-8.CEL PolyA Ctrl
> > PolyHS-1.CEL PolyA HS
> > PolyHS-2.CEL PolyA HS
> > PolyHS-3.CEL PolyA HS
> > PolyHS-4.CEL PolyA HS
> > PolyHS-5.CEL PolyA HS
> > PolyHS-6.CEL PolyA HS
> > PolyHS-7.CEL PolyA HS
> > PolyHS-8.CEL PolyA HS
> >> treatment <-c("C",
> > "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS")
> >> design <- model.matrix(~factor(treatment))
> >> colnames(design) <- c("Ctrl", "CvsHS")
> >> design
> > Ctrl CvsHS
> > 1 1 0
> > 2 1 0
> > 3 1 0
> > 4 1 0
> > 5 1 0
> > 6 1 0
> > 7 1 0
> > 8 1 0
> > 9 1 1
> > 10 1 1
> > 11 1 1
> > 12 1 1
> > 13 1 1
> > 14 1 1
> > 15 1 1
> > 16 1 1
> > attr(,"assign")
> > [1] 0 1
> > attr(,"contrasts")
> > attr(,"contrasts")$`factor(treatment)`
> > [1] "contr.treatment"
> >> options(digits=4)
> >> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr")
> > ID logFC AveExpr t P.Value adj.P.Val
>B
> > 16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816
>-1.722
> > 12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816
>-1.790
> > 2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816
>-2.061
> > 2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816
>-2.142
> > 10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816
>-2.207
> > 10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816
>-2.312
> > 8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816
>-2.374
> > 2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816
>-2.399
> > 5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816
>-2.400
> > 19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816
>-2.419
> > 3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816
>-2.434
> > 14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816
>-2.454
> > 10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816
>-2.456
> > 2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816
>-2.464
> > 5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816
>-2.481
> > 21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816
>-2.493
> > 800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816
>-2.511
> > 27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816
>-2.516
> > 22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816
>-2.543
> > 4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816
>-2.557
> > 13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816
>-2.568
> > 12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816
>-2.591
> > 20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816
>-2.594
> > 17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816
>-2.603
> > 23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816
>-2.608
> >
> >> sessionInfo()
> > R version 2.7.1 (2008-06-23)
> > i386-pc-mingw32
> >
> > locale:
> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> > States.1252;LC_MONETARY=English_United
> >
> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] splines tools stats graphics grDevices utils datasets
> > methods base
> >
> > other attached packages:
> > [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1
> > KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0
> > [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0
> > Category_2.6.0 genefilter_1.20.0 survival_2.34-1
> > [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2
> > GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9
> > [19] DBI_0.2-4 graph_1.18.1 limma_2.14.5
> > affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0
> > [25] Biobase_2.0.1
> >
> > loaded via a namespace (and not attached):
> > [1] cluster_1.11.11 XML_1.95-3
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>--
>James W. MacDonald, MS
>Biostatistician
>UMCCC cDNA and Affymetrix Core
>University of Michigan
>1500 E Medical Center Drive
>7410 CCGC
>Ann Arbor MI 48109
>734-647-5623
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list