[BioC] p.adj and logFC

Sat Oct 4 22:27:57 CEST 2008

On Fri, Oct 3, 2008 at 7:58 PM, Yisong Zhen <zhenyisong at gmail.com> wrote:
> Thanks for your reply.
>
> My understanding for the p.adjust from the R help page is that  it is
> adjusted by number of comparison.
> I mean, if I filter genes (below certain expression threshold) before I
> carry out multiple testing, the p.adjust would be decreased.
> But how do we set the threshold for p.adjust, <=0.05?

p.adjust has an argument for the type of adjustment.  You need to
specify that.  If you specify 'fdr" for false discovery, then it makes
sense to use a larger cutoff for significance, depending on your
needs.  Keep in mind that an fdr is interpreted with respect to a LIST
OF GENES.  So a false discovery rate of 0.2, for example, means that
20% of the genes that occur in the list up to that point are likely
false discoveries.  You can see how a cutoff of 0.05 might not make
sense in this situation, depending on your needs.

> If so, the MKG.56.34.1_at (probe in our array comparison, de facto
> decreasing expressed gene)  will not be statistically significant, although
> its fold changes reach about 6. Then how do we interpret this result?

It is possible to have a high fold change without statistical
significance if the variability in the measure is also high.

Sean

>  On Fri, Sep 19, 2008 at 7:23 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>>
>>
>>  On Fri, Sep 19, 2008 at 10:12 AM, James W. MacDonald <
>> jmacdon at med.umich.edu> wrote:
>>
>>> Hi Yisong,
>>>
>>> Yisong Zhen wrote:
>>>
>>>> Hello All,
>>>> I used the following script to find the genes that both have p.adj < 0.05
>>>> and logFC < -1 in comparisons (TCdn - TCwt and Tcestw - TCwt).
>>>> I checked one candidate gene in the normalized expression output
>>>> (affy_My_data.txt). The first two are TCdn, middle two are TCwt and the
>>>> last
>>>> two are TCestw):
>>>>
>>>> MKG.56.34.1_at  4.30886639379254 5.33637926862047 11.8385942887500
>>>> 10.9736438003031 4.22713527087133 3.28741332008267
>>>>
>>>> It seems that this gene have same logFC in two comparisons (TCdn~ TCwt
>>>> and
>>>> TCets~TCwt). But the p.adj is very different: one is 0.053759(TCdn~TCwt)
>>>> and
>>>> the other is 0.009286 (TCetsw~TCwt).
>>>>
>>>> I am not familiar with LIMMA model and the code is simply from
>>>> Biocoductor
>>>> manual, please give some advice and tell me how to interpret this
>>>> result.(How the p.adj is calculated? Why the same gene in two comparisons
>>>> have same logFC but the p.adj is so different?)
>>>>
>>>
>>> The reason you get a different p-value with the same log FC is because the
>>> p-value is based on the t-statistic, not the fold change (although the fold
>>> change is the numerator of that statistic).
>>>
>>> If you look at the t-statistics for these two genes you will see that they
>>> are different.
>>>
>>
>> And even if the t-statistics were identical, the adjustment method you have
>> chosen ('fdr') is a function of a LIST of genes, and NOT the gene itself.
>> For details, you can read the help for p.adjust().
>>
>> Sean
>>
>>
>>>
>>>
>>>
>>>
>>>> Thanks.
>>>>
>>>> Yisong
>>>>
>>>>
>>>>
>>>> library(limma)
>>>> library(affy)
>>>> targets<-readTargets("targets.txt")
>>>> #targets;
>>>> data<-ReadAffy(filenames=targets$FileName)
>>>> eset<-rma(data)
>>>> write.exprs(eset, file="affy_My_data.txt")
>>>> design<-model.matrix(~-1+factor(c(1,1,2,2,3,3)))
>>>> colnames(design)<-c("TCdn","TCwt","TCetsw")
>>>> fit<-lmFit(eset,design)
>>>> contrast.matrix<-makeContrasts(TCdn-TCwt,TCetsw-TCwt, levels=design)
>>>> fit2<-contrasts.fit(fit, contrast.matrix)
>>>> fit2<-eBayes(fit2)
>>>> write.table(topTable(fit2, coef=1, adjust="fdr", sort.by="B",
>>>> number=50000),
>>>> file="limmadn_wt.xls", row.names=F, sep="\t")
>>>> write.table(topTable(fit2, coef=2, adjust="fdr", sort.by="B",
>>>> number=50000),
>>>> file="limmaestw_wt.xls", row.names=F, sep="\t")
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> Hildebrandt Lab
>>> 8220D MSRB III
>>> 1150 W. Medical Center Drive
>>> Ann Arbor MI 48109-0646
>>> 734-936-8662
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
>
> --
> //-----------------------------------------------------------------------------------------
> // We have a hunger of the mind which asks for knowledge
> // of all around us, and the more we gain, the more is
> // our desire; the more we see, the more we are capable
> // of seeing.
> //----------------------------------------------------------------------------------------
>
> Yisong Zhen, Ph.D (Davidson Lab)
> Molecular & Cellular Biology
> Molecular Cardiovascular Research Program
> University of Arizona
> 1656 E. Mabel, MRB 317
> Tucson, AZ 85724 USA
> Lab: 520-626-8153
> Cell: 520-977-7397
>
> @---------------------------------------------------------------------------------------
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>