[BioC] necessity of moderated t statistic and false discoveries for small predefined gene list?
Richard Friedman
friedman at cancercenter.columbia.edu
Fri May 18 14:47:10 CEST 2012
Steve,
I have reread the paper and believe that the "genes of
biological interest filter" about which I am asking, is qualitatively
different than the numerical filters in the paper. I will follow
Moshe and Kasper's advice and use the moderated t-statistic.
Thanks and best wishes,
Rich
On May 17, 2012, at 11:54 AM, Steve Lianoglou wrote:
> Hi Richard,
>
> It seems to me that this paper is highly relevant to the question you
> are trying to answer:
>
> Independent filtering increases detection power for high-throughput
> experiments
> http://www.pnas.org/content/107/21/9546.full
>
> Perhaps you can see where your "filtering scheme" lands in the
> landscape of filters described there.
>
> HTH,
> -steve
>
> On Thu, May 17, 2012 at 9:25 AM, Richard Friedman
> <friedman at cancercenter.columbia.edu> wrote:
>> Moshe,
>>
>> Thank you for the clarification on the moderated t-statistic.
>> If I am only interested in 10 genes is it better to calculate the
>> moderated
>> statistic and hence raw p-values based on all of the genes on the
>> array
>> or just thoe 10 genes?
>>
>> Best wishes,
>> Rich
>>
>>
>> On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote:
>>
>>> Hi Rich,
>>>
>>> I think that Gordon Smyth (the author of limma) has explained at
>>> this list
>>> what moderated t-statistic is.
>>> The brief explanation is that when there are few samples the
>>> estimate of
>>> the variance which is used in a standard t-test is quite noisy and
>>> because
>>> one must account for this noise the standard t-test has a low
>>> statistical
>>> power. The Empirical Bayes model used in the moderated t-tests
>>> allows to
>>> estimate the variance with more confidence and therefore has a
>>> better
>>> power. So it can be used even if you are interested in just a few
>>> genes.
>>> It has (almost) nothing to do with the multiple testing
>>> adjustment. Well,
>>> one may ask whether moderated p-values satisfy the assumptions of
>>> multiple
>>> testing adjustment procedures (in particular the BH), but this is
>>> another
>>> story. May be Gordon will comment on this.
>>>
>>> Best regards,
>>> Moshe.
>>>
>>>> Moshe and List,
>>>>
>>>> Thanks for yoru reply. The method you describe retains
>>>> the raw p-value based on the moderated t-statistic and adjusts
>>>> it to give an adjusted p-value (usually a false discovery rate).
>>>> However, as I understand it, the moderated
>>>> t-statistic given by Limma based on
>>>> all of the genes in the array, pools variance information
>>>> to moderate the standard deviation to prevent fortuitously
>>>> low p-values stemming from fortuitously low standard deviations
>>>> encountered in thousands of multiple tests.I am wondering
>>>> that if the experimentalist asks me to look up just 10 genes
>>>> I should use the unmoderated frequentist t-statistic which
>>>> will differ from the one in Limma and may imply significance
>>>> where Limma does not. I guess another way to phrase it is
>>>> "How many simulataneous tests does one need before one
>>>> should prefer the moderated statistic to the empirical
>>>> Bayesian one". Or should I fit just those 10 genes
>>>> (~30 affy probes) with Limma?
>>>>
>>>> Best wishes,
>>>> Rich
>>>>
>>>>
>>>>
>>>> On Thu, 17 May 2012, Moshe Olshansky wrote:
>>>>
>>>>> Hi Rich,
>>>>>
>>>>> Whether to use the moderated t-statistic or not does not depend on
>>>>> whether
>>>>> you are interested in the 10 particular genes or in all
>>>>> differentially
>>>>> expressed ones. This will affect your multiple testing adjustment.
>>>>> The simplest way for you to proceed is to use limma as usual,
>>>>> get the
>>>>> topTable but then take the UNADJUSTED p-values for your 10 genes
>>>>> of
>>>>> interest and use the p.adjust function to adjust for multiple
>>>>> testing if
>>>>> you wish. In any case you should also look at (log)Fold Changes.
>>>>>
>>>>> Best regards,
>>>>> Moshe.
>>>>>
>>>>>
>>>>>> Dear Bioconductor List.
>>>>>>
>>>>>> I am using Limma to analyze differential expression
>>>>>> between 2
>>>>>> conditions on an Affy chip.
>>>>>> My experimental collaborator asks for the differential
>>>>>> expression of
>>>>>> 10 predefined genes.
>>>>>>
>>>>>> A, Should I correct for false discoveries based upon all of the
>>>>>> genes
>>>>>> on the chip?
>>>>>> B. If not, should I correct for false discoveries just for the
>>>>>> probeids for the 10 predefined
>>>>>> genes?
>>>>>> C. Should I use the moderated t-statistic or just use an
>>>>>> unmoderated t-
>>>>>> test for those 10
>>>>>> genes.
>>>>>>
>>>>>> Thanks and best wishes,
>>>>>> Rich
>>>>>> ------------------------------------------------------------
>>>>>> Richard A. Friedman, PhD
>>>>>> Associate Research Scientist,
>>>>>> Biomedical Informatics Shared Resource
>>>>>> Herbert Irving Comprehensive Cancer Center (HICCC)
>>>>>> Lecturer,
>>>>>> Department of Biomedical Informatics (DBMI)
>>>>>> Educational Coordinator,
>>>>>> Center for Computational Biology and Bioinformatics (C2B2)/
>>>>>> National Center for Multiscale Analysis of Genomic Networks
>>>>>> (MAGNet)
>>>>>> Room 824
>>>>>> Irving Cancer Research Center
>>>>>> Columbia University
>>>>>> 1130 St. Nicholas Ave
>>>>>> New York, NY 10032
>>>>>> (212)851-4765 (voice)
>>>>>> friedman at cancercenter.columbia.edu
>>>>>> http://cancercenter.columbia.edu/~friedman/
>>>>>>
>>>>>> "School is an evil plot to suppress my individuality"
>>>>>>
>>>>>> Rose Friedman, age15
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> ------------------------------------------------------------
>>>> Richard A. Friedman, PhD
>>>> Associate Research Scientist
>>>> Herbert Irving Comprehensive Cancer Center
>>>> Biomedical Informatics Shared Resource
>>>> Lecturer
>>>> Department of Biomedical Informatics
>>>> Box 95, Room 130BB or P&S 1-420C
>>>> Columbia University Medical Center
>>>> 630 W. 168th St.
>>>> New York, NY 10032
>>>> (212)305-6901 (5-6901) (voice)
>>>> friedman at cancercenter.columbia.edu
>>>> http://cancercenter.columbia.edu/~friedman/
>>>>
>>>> "The last 250 pages of the last Harry Potter
>>>> book took place in one day because alot
>>>> happened in that day. All of Ulysses takes
>>>> place in one day and nothing happened in that day."
>>>> -Rose Friedman, age 11
>>>>
>>>>
>>>
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and inte...{{dropped:
>>> 6}}
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list