[BioC] necessity of moderated t statistic and false discoveries for small predefined gene list?

Fri May 18 01:27:50 CEST 2012

Hi Rich,

You have already got the answer from Kasper. This is exactly what I am
suggesting.
The idea is that after log transformation the variances of the genes
follow some distribution. So the more genes you are using the better you
can estimate that distribution. This is just a model, nobody is claiming
that this is what really happens. But it seems to work pretty well in
"real life".

Moshe.

> Moshe,
>
> 	Thank you for the clarification on the moderated t-statistic.
> If I am only interested in 10 genes is it better to calculate the
> moderated
> statistic and hence raw p-values based on all of the genes on the array
> or just thoe 10 genes?
>
> Best wishes,
> Rich
>
> On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote:
>
>> Hi Rich,
>>
>> I think that Gordon Smyth (the author of limma) has explained at
>> this list
>> what moderated t-statistic is.
>> The brief explanation is that when there are few samples the
>> estimate of
>> the variance which is used in a standard t-test is quite noisy and
>> because
>> one must account for this noise the standard t-test has a low
>> statistical
>> power. The Empirical Bayes model used in the moderated t-tests
>> allows to
>> estimate the variance with more confidence and therefore has a better
>> power. So it can be used even if you are interested in just a few
>> genes.
>> It has (almost) nothing to do with the multiple testing adjustment.
>> Well,
>> one may ask whether moderated p-values satisfy the assumptions of
>> multiple
>> testing adjustment procedures (in particular the BH), but this is
>> another
>> story. May be Gordon will comment on this.
>>
>> Best regards,
>> Moshe.
>>
>>> Moshe and List,
>>>
>>> 	Thanks for yoru reply. The method you describe retains
>>> the raw p-value based on the moderated t-statistic and adjusts
>>> it to give an adjusted p-value (usually a false discovery rate).
>>> However, as I understand it, the moderated
>>> t-statistic given by Limma based on
>>> all of the genes in the array, pools variance information
>>> to moderate the standard deviation to prevent fortuitously
>>> low p-values stemming from fortuitously low standard deviations
>>> encountered in thousands of multiple tests.I am wondering
>>> that if the experimentalist asks me to look up just 10 genes
>>> I should use the unmoderated frequentist t-statistic which
>>> will differ from the one in Limma and may imply significance
>>> where Limma does not. I guess another way to phrase it is
>>> "How many simulataneous tests does one need before one
>>> should prefer the moderated statistic to the empirical
>>> Bayesian one". Or should I fit just those 10 genes
>>> (~30 affy probes) with Limma?
>>>
>>> Best wishes,
>>> Rich
>>>
>>>
>>>
>>> On Thu, 17 May 2012, Moshe Olshansky wrote:
>>>
>>>> Hi Rich,
>>>>
>>>> Whether to use the moderated t-statistic or not does not depend on
>>>> whether
>>>> you are interested in the 10 particular genes or in all
>>>> differentially
>>>> expressed ones. This will affect your multiple testing adjustment.
>>>> The simplest way for you to proceed is to use limma as usual, get
>>>> the
>>>> topTable but then take the UNADJUSTED p-values for your 10 genes of
>>>> interest and use the p.adjust function to adjust for multiple
>>>> testing if
>>>> you wish. In any case you should also look at (log)Fold Changes.
>>>>
>>>> Best regards,
>>>> Moshe.
>>>>
>>>>
>>>>> Dear Bioconductor  List.
>>>>>
>>>>> 	I am using Limma to analyze differential expression between 2
>>>>> conditions on an Affy chip.
>>>>> My experimental collaborator asks for the differential
>>>>> expression of
>>>>> 10 predefined genes.
>>>>>
>>>>> A, Should I correct for false discoveries based upon all of the
>>>>> genes
>>>>> on the chip?
>>>>> B. If not, should I correct for false discoveries just for the
>>>>> probeids for the 10 predefined
>>>>> genes?
>>>>> C. Should I use the moderated t-statistic or just use an
>>>>> unmoderated t-
>>>>> test for those 10
>>>>> genes.
>>>>>
>>>>> Thanks and best wishes,
>>>>> Rich
>>>>> ------------------------------------------------------------
>>>>> Richard A. Friedman, PhD
>>>>> Associate Research Scientist,
>>>>> Biomedical Informatics Shared Resource
>>>>> Herbert Irving Comprehensive Cancer Center (HICCC)
>>>>> Lecturer,
>>>>> Department of Biomedical Informatics (DBMI)
>>>>> Educational Coordinator,
>>>>> Center for Computational Biology and Bioinformatics (C2B2)/
>>>>> National Center for Multiscale Analysis of Genomic Networks
>>>>> (MAGNet)
>>>>> Room 824
>>>>> Irving Cancer Research Center
>>>>> Columbia University
>>>>> 1130 St. Nicholas Ave
>>>>> New York, NY 10032
>>>>> (212)851-4765 (voice)
>>>>> friedman at cancercenter.columbia.edu
>>>>> http://cancercenter.columbia.edu/~friedman/
>>>>>
>>>>> "School is an evil plot to suppress my individuality"
>>>>>
>>>>> Rose Friedman, age15
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> ------------------------------------------------------------
>>> Richard A. Friedman, PhD
>>> Associate Research Scientist
>>> Herbert Irving Comprehensive Cancer Center
>>> Biomedical Informatics Shared Resource
>>> Lecturer
>>> Department of Biomedical Informatics
>>> Box 95, Room 130BB or P&S 1-420C
>>> Columbia University Medical Center
>>> 630 W. 168th St.
>>> New York, NY 10032
>>> (212)305-6901 (5-6901) (voice)
>>> friedman at cancercenter.columbia.edu
>>> http://cancercenter.columbia.edu/~friedman/
>>>
>>> "The last 250 pages of the last Harry Potter
>>> book took place in one day because alot
>>> happened in that day. All of Ulysses takes
>>> place in one day and nothing happened in that day."
>>> -Rose Friedman, age 11
>>>
>>>
>>
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely
>> for the addressee.
>> You must not disclose, forward, print or use it without the
>> permission of the sender.
>> ______________________________________________________________________
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}