[BioC] Understanding limma, fdr and topTable
James MacDonald
jmacdon at med.umich.edu
Tue Jul 8 16:34:42 CEST 2008
I would add that removing those genes that are unchanged in any sample
will also help reduce the multiplicity problem. Regardless of the
expression level, those genes that never change expression are
uninteresting by default, so e.g., if beta-actin is highly expressed at
the same level in all samples we don't really care to test for
differential expression for that gene since it apparently is not
differentially expressed.
Best,
Jim
Naomi Altman wrote:
> The current wisdom seems to be that removing very low intensity genes
> from the dataset improves power. This certainly makes sense, because
> there is definitely a detection limit. I do not have a reference, but
> if you search the archived emails, you will find a lot of discussion
> about this.
>
> That being said, you should only filter genes that are low in every
> sample. Otherwise, you could filter out the most interesting genes -
> the ones that are expressed only under certain conditions.
>
> --Naomi
>
>
>
> At 08:27 AM 7/8/2008, Louisa A Rispoli/AS/EXP/UTIA wrote:
>> Jim, Tom and Naomi and all-
>>
>> Thank you for your reponses. I am planning to look for genes (if any)
>> that
>> changed similaryly between the two groups depending on amplification
>> procedure but was just getting starting looking at individually first. I
>> appreciate the advice. Would pre-filtering to remove "absent" from all
>> sample genes improve the situation or create a different problem, reading
>> the archives there seems to be some disagreement if prefiltering before
>> eBayes is appropriate. Thank you again I appreciate any help I get.
>>
>> Louisa
>>
>>
>> "If we knew what we were doing, it wouldn't be called Research." -
>> Albert
>> Einstein
>>
>> Louisa Rispoli, Ph.D. Reproductive Physiology
>> Department of Animal Science
>> University of Tennessee, Knoxville
>> A105 Johnson Animal Research and Teaching Unit
>> 1750 Alcoa Highway
>> Knoxville, TN 37920
>> phone:(865) 946-1874
>> fax:(865) 946-1010
>> email: lrispoli at utk.edu
>>
>>
>>
>>
>>
>> James MacDonald
>> <jmacdon at med.umic
>>
>> h.edu> To
>> Sent by: Louisa A Rispoli/AS/EXP/UTIA
>> bioconductor-boun <larispoli at mail.ag.utk.edu>
>>
>> ces at stat.math.eth cc
>> z.ch bioconductor at stat.math.ethz.ch
>>
>> Subject
>> Re: [BioC] Understanding limma,
>> fdr
>> 07/07/2008 06:44 and topTable
>> PM
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Hi Louisa,
>>
>> Louisa A Rispoli/AS/EXP/UTIA wrote:
>> > To all-
>> >
>> > I am a newbie trying to analyze microarray with minimal help and
>> thought
>> > that I had figured out all. We have a simple task of comparing two
>> groups
>> > with 8 replicates on the bovine genechip. I am attempting to understand
>> the
>> > results that I obtain using limma and adjusting for fdr. I have tried
>> > reading the help vignettes on p.adjust and topTablle but no closer to
>> > understanding if the adjusted p-value represents the fdr or the q-value
>> or
>> > something altogether different. Based on some recent work in another
>> lab
>> > (by abstract) I know that I may need to use a less stringent fdr
>> then 5%
>> > but I am unsure in limma how to change that value (or if it is feasible
>> at
>> > all). Looking at the results that I obtained so far, I do not have any
>> > differentially expressed genes with the fdr adjustment. But I could be
>> > interpreting that wrong, since I was looking for values of the
>> adjusted p
>> > to be lower then 0.05 and the smallest that I see is 0.7816, If someone
>> > could look at this and assist, any help, advice or reprimmand would be
>> > appreciated.
>>
>> The adjusted p-value does indeed represent the FDR, and it looks like
>> you have little evidence for differential expression. However, this
>> doesn't mean that there is no differential expression, just that you
>> don't have much evidence to say there is.
>>
>> Given the data in hand, the table you show does give those probesets
>> that appear to be the most consistently changed, and those are the most
>> likely to validate if you were to choose to do so.
>>
>> Looking at the way you fit the model, I wonder what the end goal of the
>> experiment might have been. When I see that sort of thing in the lab,
>> almost always the client is looking for the interaction (e.g., genes
>> that react differently to the treatment with HS depending on if they are
>> wt or polyA). Is that not the case here?
>>
>> Regardless, I assume at the very least you might want to know the
>> difference between the hs and control wt samples as well. If you fit a
>> model that includes these samples and then compute the contrasts you
>> might get better results (depending on the intra-group variability of
>> the wt samples), as the denominator of your contrast will be based on
>> all 32 samples rather than just the 16 you are using currently.
>>
>> Best,
>>
>> Jim
>>
>> >
>> >
>> > Thanks
>> >
>> > Louisa
>> >
>> > "If we knew what we were doing, it wouldn't be called Research." -
>> Albert
>> > Einstein
>> >
>> > Louisa Rispoli, Ph.D. Reproductive Physiology
>> > Department of Animal Science
>> > University of Tennessee, Knoxville
>> > A105 Johnson Animal Research and Teaching Unit
>> > 1750 Alcoa Highway
>> > Knoxville, TN 37920
>> > phone:(865) 946-1874
>> > fax:(865) 946-1010
>> > email: lrispoli at utk.edu
>> >
>> > R version 2.7.1 (2008-06-23)
>> > Copyright (C) 2008 The R Foundation for Statistical Computing
>> > ISBN 3-900051-07-0
>> >> library(affycoretools)
>> > Loading required package: affy
>> > Loading required package: Biobase
>> > Loading required package: tools
>> >
>> > Welcome to Bioconductor
>> >
>> > Vignettes contain introductory material. To view, type
>> > 'openVignette()'. To cite Bioconductor, see
>> > 'citation("Biobase")' and for packages 'citation(pkgname)'.
>> >
>> > Loading required package: affyio
>> > Loading required package: preprocessCore
>> > Loading required package: limma
>> > Loading required package: GOstats
>> > Loading required package: graph
>> > Loading required package: GO.db
>> > Loading required package: AnnotationDbi
>> > Loading required package: DBI
>> > Loading required package: RSQLite
>> > Loading required package: annotate
>> > Loading required package: xtable
>> > Loading required package: RBGL
>> > Loading required package: Category
>> > Loading required package: genefilter
>> > Loading required package: survival
>> > Loading required package: splines
>> > Loading required package: biomaRt
>> > Loading required package: RCurl
>> >
>> > Attaching package: 'biomaRt'
>> >
>> >
>> > The following object(s) are masked from package:annotate :
>> >
>> > getGO
>> >
>> > Loading required package: gcrma
>> > Loading required package: matchprobes
>> > Loading required package: annaffy
>> > Loading required package: KEGG.db
>> >
>> > Attaching package: 'annaffy'
>> >
>> >
>> > The following object(s) are masked from package:RCurl :
>> >
>> > getURL
>> >
>> >> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE,
>> > row.names=1)
>> >> data <- ReadAffy(phenoData=pd)
>> >> pData(data)
>> > amp trt
>> > PolyC-1.CEL PolyA Ctrl
>> > PolyC-2.CEL PolyA Ctrl
>> > PolyC-3.CEL PolyA Ctrl
>> > PolyC-4.CEL PolyA Ctrl
>> > PolyC-5.CEL PolyA Ctrl
>> > PolyC-6.CEL PolyA Ctrl
>> > PolyC-7.CEL PolyA Ctrl
>> > PolyC-8.CEL PolyA Ctrl
>> > PolyHS-1.CEL PolyA HS
>> > PolyHS-2.CEL PolyA HS
>> > PolyHS-3.CEL PolyA HS
>> > PolyHS-4.CEL PolyA HS
>> > PolyHS-5.CEL PolyA HS
>> > PolyHS-6.CEL PolyA HS
>> > PolyHS-7.CEL PolyA HS
>> > PolyHS-8.CEL PolyA HS
>> > WTC-1.CEL WT Ctrl
>> > WTC-2.CEL WT Ctrl
>> > WTC-3.CEL WT Ctrl
>> > WTC-4.CEL WT Ctrl
>> > WTC-5.CEL WT Ctrl
>> > WTC-6.CEL WT Ctrl
>> > WTC-7.CEL WT Ctrl
>> > WTC-8.CEL WT Ctrl
>> > WTHS-1.CEL WT HS
>> > WTHS-2.CEL WT HS
>> > WTHS-3.CEL WT HS
>> > WTHS-4.CEL WT HS
>> > WTHS-5.CEL WT HS
>> > WTHS-6.CEL WT HS
>> > WTHS-7.CEL WT HS
>> > WTHS-8.CEL WT HS
>> >> Poly.rma <- rma(data[,1:16])
>> > Background correcting
>> > Normalizing
>> > Calculating Expression
>> >> pData(Poly.rma)
>> > amp trt
>> > PolyC-1.CEL PolyA Ctrl
>> > PolyC-2.CEL PolyA Ctrl
>> > PolyC-3.CEL PolyA Ctrl
>> > PolyC-4.CEL PolyA Ctrl
>> > PolyC-5.CEL PolyA Ctrl
>> > PolyC-6.CEL PolyA Ctrl
>> > PolyC-7.CEL PolyA Ctrl
>> > PolyC-8.CEL PolyA Ctrl
>> > PolyHS-1.CEL PolyA HS
>> > PolyHS-2.CEL PolyA HS
>> > PolyHS-3.CEL PolyA HS
>> > PolyHS-4.CEL PolyA HS
>> > PolyHS-5.CEL PolyA HS
>> > PolyHS-6.CEL PolyA HS
>> > PolyHS-7.CEL PolyA HS
>> > PolyHS-8.CEL PolyA HS
>> >> treatment <-c("C",
>> > "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS")
>> >> design <- model.matrix(~factor(treatment))
>> >> colnames(design) <- c("Ctrl", "CvsHS")
>> >> design
>> > Ctrl CvsHS
>> > 1 1 0
>> > 2 1 0
>> > 3 1 0
>> > 4 1 0
>> > 5 1 0
>> > 6 1 0
>> > 7 1 0
>> > 8 1 0
>> > 9 1 1
>> > 10 1 1
>> > 11 1 1
>> > 12 1 1
>> > 13 1 1
>> > 14 1 1
>> > 15 1 1
>> > 16 1 1
>> > attr(,"assign")
>> > [1] 0 1
>> > attr(,"contrasts")
>> > attr(,"contrasts")$`factor(treatment)`
>> > [1] "contr.treatment"
>> >> options(digits=4)
>> >> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr")
>> > ID logFC AveExpr t P.Value adj.P.Val
>> B
>> > 16088 Bt.27852.2.S1_at -0.4764 3.621 -5.274 5.709e-05 0.7816
>> -1.722
>> > 12937 Bt.24859.1.A1_at 0.6632 6.940 5.152 7.374e-05 0.7816
>> -1.790
>> > 2853 Bt.13563.2.S1_at -0.5288 6.747 -4.703 1.922e-04 0.7816
>> -2.061
>> > 2474 Bt.13162.1.S1_at -0.6424 5.964 -4.574 2.534e-04 0.7816
>> -2.142
>> > 10402 Bt.22107.1.S1_at -0.3261 10.358 -4.475 3.143e-04 0.7816
>> -2.207
>> > 10654 Bt.22355.1.S1_at -0.3533 8.650 -4.316 4.439e-04 0.7816
>> -2.312
>> > 8883 Bt.20584.1.S1_at -0.3671 8.106 -4.225 5.417e-04 0.7816
>> -2.374
>> > 2851 Bt.13562.1.S1_at 0.6901 8.865 4.190 5.850e-04 0.7816
>> -2.399
>> > 5772 Bt.1754.1.S1_at -0.5986 9.348 -4.187 5.885e-04 0.7816
>> -2.400
>> > 19715 Bt.4440.1.A1_at -0.6881 2.526 -4.160 6.242e-04 0.7816
>> -2.419
>> > 3163 Bt.13933.1.S1_at 0.4519 7.798 4.138 6.549e-04 0.7816
>> -2.434
>> > 14921 Bt.26765.1.S1_at -0.3673 7.846 -4.110 6.962e-04 0.7816
>> -2.454
>> > 10751 Bt.22445.1.S1_at -0.3222 8.622 -4.108 7.002e-04 0.7816
>> -2.456
>> > 2906 Bt.13629.1.A1_at 0.2830 2.294 4.095 7.196e-04 0.7816
>> -2.464
>> > 5721 Bt.1749.1.S1_at 0.6030 3.839 4.072 7.578e-04 0.7816
>> -2.481
>> > 21192 Bt.6078.1.S1_at -0.4385 8.564 -4.054 7.879e-04 0.7816
>> -2.493
>> > 800 Bt.11145.1.S1_at -0.5060 2.461 -4.030 8.312e-04 0.7816
>> -2.511
>> > 27 AFFX-Bt-ECOLOXL_at 0.3410 1.290 4.022 8.443e-04 0.7816
>> -2.516
>> > 22515 Bt.7980.1.S1_at -0.3734 7.622 -3.983 9.196e-04 0.7816
>> -2.543
>> > 4609 Bt.16378.1.A1_at -0.3775 5.071 -3.964 9.607e-04 0.7816
>> -2.557
>> > 13765 Bt.25669.1.S1_at -0.5523 8.197 -3.948 9.933e-04 0.7816
>> -2.568
>> > 12023 Bt.24033.1.A1_at -0.4962 6.256 -3.917 1.065e-03 0.7816
>> -2.591
>> > 20843 Bt.5578.1.S1_at -0.4133 7.904 -3.913 1.073e-03 0.7816
>> -2.594
>> > 17021 Bt.28620.1.S1_at 0.4720 5.181 3.899 1.106e-03 0.7816
>> -2.603
>> > 23922 Bt.9791.1.S1_at -0.3420 9.963 -3.893 1.122e-03 0.7816
>> -2.608
>> >
>> >> sessionInfo()
>> > R version 2.7.1 (2008-06-23)
>> > i386-pc-mingw32
>> >
>> > locale:
>> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> > States.1252;LC_MONETARY=English_United
>> >
>> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>> >
>> > attached base packages:
>> > [1] splines tools stats graphics grDevices utils
>> datasets
>> > methods base
>> >
>> > other attached packages:
>> > [1] bovinecdf_2.2.0 affycoretools_1.12.0 annaffy_1.12.1
>> > KEGG.db_2.2.0 gcrma_2.12.1 matchprobes_1.12.0
>> > [7] biomaRt_1.14.0 RCurl_0.9-3 GOstats_2.6.0
>> > Category_2.6.0 genefilter_1.20.0 survival_2.34-1
>> > [13] RBGL_1.16.0 annotate_1.18.0 xtable_1.5-2
>> > GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9
>> > [19] DBI_0.2-4 graph_1.18.1 limma_2.14.5
>> > affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0
>> > [25] Biobase_2.0.1
>> >
>> > loaded via a namespace (and not attached):
>> > [1] cluster_1.11.11 XML_1.95-3
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, MS
>> Biostatistician
>> UMCCC cDNA and Affymetrix Core
>> University of Michigan
>> 1500 E Medical Center Drive
>> 7410 CCGC
>> Ann Arbor MI 48109
>> 734-647-5623
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, MS
Biostatistician
UMCCC cDNA and Affymetrix Core
University of Michigan
1500 E Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list