[BioC] edgeR: Treatment effect at any time (UG3.3.3)

Fri Oct 25 00:43:20 CEST 2013

> Date: Thu, 24 Oct 2013 01:35:57 -0500
> From: Matt McNeill <mcneill at life.illinois.edu>
> To: Bioconductor at r-project.org
> Subject: [BioC] edgeR: Treatment effect at any time (UG3.3.3)
>
> Good evening BioConductor ListServ,
>
> I am interested in using edgeR to generate a single list of genes 
> differentially expressed between treatment and control and either of two 
> time points (similar to User Guide section 3.3.3). I have generated what 
> I think is the correct contrast and model, but I am confused by the 
> number of genes at FDR<0.1 on the output list compared to the number of 
> genes at FDR<0.1 at either of the two time points alone.
>
> I see non-additive numbers of genes with adjusted p-values (FDR) <0.1 
> coming up on related lists. In my analysis, I am using a contrast model 
> (section 3.2.3) to examine differential gene expression between 
> individuals collected at two time points (30 and 60 min) and in two 
> conditions (control and treatment). There are (1) 11 genes 
> differentially expressed at 30 min between treatment and control, but 
> (2) 772 genes at 60 min. Independent of time point, there are (3) 216 
> differentially expressed genes between treatment and control. I wanted 
> to create a single list that represented the differentially expressed 
> genes at either time point, so I added contrasts (1) and (2) together 
> prior to running the glmLRT. Now, my list has 319 genes at FDR<0.1, but 
> I am not sure why the number of genes isn't between 772-783, or reflect 
> the results from contrast (3). Can you help me understand these 
> differences?

This is to be expected, and is similar to the difference between an F-test 
as compared to a t-test.  When you do test (2) you are looking 
specifically for changes at time 60min.  When you do test (3) you are not 
specifying when the changes occur.  Given that almost all the changes do 
in fact occur at 60min, you naturally have more power to detect these 
changes when you look for changes at this time specifically.  When you 
perform test (3), you are asking the statistical test to control the error 
rate over a larger number of possible comparisons, so the bigger changes 
at 60min get diluted by the smaller changes at 30min making them harder to 
detect.

In general, the F-test (multiple contrasts at once) has more power than 
testing for individual contrasts if all the contrasts have logFC about the 
same size, so that they accumulate strength.  However if one of the 
contrasts is much larger than the others, then the opposite is ture.  In 
that case power is maximized by testing for that contrast specifically, 
meaning that the t-test for that contrast specifically has more power.

Bottom line: you can't increase power in general for a statistical test by 
being more vague regarding the alternative hypothesis.

Best wishes
Gordon

> Please find relevant sections of my code below. I am using edgeR version
> 3.3.8 in R version 3.0.2.
>
> Thanks very much for any help you can provide!
>
> Matt
>
> group <- factor(paste(targets$treatment, targets$timepoint, sep = "."))
>
> design.TimeCDateLib <- model.matrix(~0+zlib + group + collect.date.factor,
> data=targets)
>
> rownames(design.TimeCDateLib) <- colnames(y)
>
> y.TimeCDateLib.common <- estimateGLMCommonDisp(y,design.TimeCDateLib,
> verbose=TRUE)
>
> y.TimeCDateLib.AmelFor30.60.common.trend <- estimateGLMTrendedDisp(
> y.TimeCDateLib.common,design.TimeCDateLib)
>
>
> y.TimeCDateLib.AmelFor30.60.common.trend.tag.df17 <- estimateGLMTagwiseDisp(
> y.TimeCDateLib.AmelFor30.60.common.trend,design.TimeCDateLib, prior.df=17) #
> This one is best. Most dots close to theoretical line.
>
> fit.TimeCDateLib.AmelFor30.60.df17 <- glmFit(
> y.TimeCDateLib.AmelFor30.60.common.trend.tag.df17,design.TimeCDateLib)
>
> x11();qq <- gof(fit.TimeCDateLib.AmelFor30.60.df17,pcutoff=.05,plot=TRUE)
>
> sum(qq$outlier==TRUE)
>
>
> contrast.TimeCDateLib.AmelFor30.60 <- makeContrasts(CtrlTreat3060=(groupc.a-
> grouptx.a)+(groupc.b-grouptx.b), CtrlTreat30=(groupc.a-grouptx.a),
> CtrlTreat60=(groupc.b-grouptx.b), CT30v60=(groupc.a-groupc.b)+(grouptx.a-
> grouptx.b), RespDiff=(groupc.a-groupc.b)-(grouptx.a-grouptx.b), C30v60 = (
> groupc.a-groupc.b), T30v60 = (grouptx.a-grouptx.b), levels=
> design.TimeCDateLib)
>
>
> (1) lrt.CtrlTreat30<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast=
> contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat30"])
>
> (2)
> lrt.CtrlTreat60<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast=
> contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat60"])
>
> (3)
> lrt.CtrlTreat3060<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast=
> contrast.TimeCDateLib.AmelFor30.60[,"CtrlTreat3060"])
>
> (1)+(2)
> lrt.CtrlTreat30or60<- glmLRT(fit.TimeCDateLib.AmelFor30.60.df17, contrast=
> contrast.TimeCDateLib.AmelFor30.60[,2:3])
>
>
>
> topTags(lrt.CtrlTreat3060,n=Inf)
>
> toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060 <- topTags(
> lrt.CtrlTreat3060,n=Inf)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table$FDR<0.05)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table$FDR<0.1)
>
> head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060$table)
>
> write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat3060,
> "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat3060.csv")
>
>
> topTags(lrt.CtrlTreat30,n=Inf)
>
> toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30 <- topTags(
> lrt.CtrlTreat30,n=Inf)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table$FDR<0.05)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table$FDR<0.1)
>
> head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30$table)
>
> write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30,
> "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat30.csv")
>
>
> topTags(lrt.CtrlTreat60,n=Inf)
>
> toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60 <- topTags(
> lrt.CtrlTreat60,n=Inf)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table$FDR<0.05)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table$FDR<0.1)
>
> head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60$table)
>
> write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat60,
> "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat60.csv")
>
>
> topTags(lrt.CtrlTreat30or60,n=Inf)
>
> toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60 <- topTags(
> lrt.CtrlTreat30or60,n=Inf)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table$FDR<
> 0.05)
>
> sum(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table$FDR<0.1
> )
>
> head(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60$table)
>
> write.csv(toptags.TimeCDateLib.AmelFor30.60.df17.lrt.CtrlTreat30or60,
> "toptags.TimeCDateLib.AmelFor30.60.df17.contrast.lrt.CtrlvTreat30or60.csv")
>
>
> -- 
> Matthew S. McNeill, Ph.D.
> Post Doctoral Research Associate
>
> University of Illinois****
>
> Institute for Genomic Biology****
>
> Department of Entomology****
>
> 1206 W Gregory, rm 2414-G****
>
> Urbana, IL 61801

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}