[BioC] EdgeR multi-factor testing questions

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jan 7 23:57:04 CET 2014


Dear Yanzhu,

Your analysis is fine from a code point of view.  From a statistical point 
of view however your analysis is too simple because you are neglecting the 
principle of marginality:

   http://en.wikipedia.org/wiki/Principle_of_marginality

For the model you have fitted, it makes sense to test for the three-way 
interaction as you do.  However it does not make statistical sense to test 
for the main effects or two-interactions until you have established that 
the three-way interaction is non-significant.

For count data, the tests for the lower-level interactions need to be 
computed by successively removing each level of interactions from the 
model.  See for example:

   https://stat.ethz.ch/pipermail/bioconductor/2013-December/056584.html

This is the same as the anova() function does in R for unbalanced linear 
factorial models.

Furthermore, testing the two-way interations is only sensible for genes 
with non-signicant 3-way interactions.  Similarly, testing the main effect 
is only sensible for genes with non-significant 2-way and 3-way 
interactions.  Otherwise these tests have no useful scientific meaning.

This is a basic drawback of the factorial anova approach.  You might 
consider the alternative approach described in Section 3.3.1 of the edgeR 
User's Guide.

Best wishes
Gordon

> Date: Mon,  6 Jan 2014 11:21:37 -0800 (PST)
> From: "Yanzhu [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, mlinyzh at gmail.com
> Subject: [BioC] EdgeR multi-factor testing questions
>
>
> I'm currently using EdgeR to analyze RNASeq data. I have one question of 
> whether I have chosen the correct method for analyzing my 
> multi-factorial experiment.

> I have 3 factors: factor A with 16 levels (from level0 to level15), 
> factor B with three levels (from level0 to leve2), and factor Sex with 
> two levels (male and female).  I'm interested in testing all main 
> effects: A, B and Sex, all two-way interaction terms:  A:B, A:Sex and 
> B:Sex; and the three-way interaction term: A:B:Sex.
>
> My code:
>
> group<-paste(A,B,Sex,sep=".")
> design<-model.matrix(~A+B+Sex+A:B+A:Sex+B:Sex+A:B:Sex)
> y<-DGEList(counts=readcounts,group=group)
> y<-calcNormFactors(y,method="TMM")
>
>
> y<-estimateGLMCommonDisp(y,design)
> y<-estimateGLMTagwiseDisp(y,design)
>
>
> fit<-glmFit(y,design)
> colnames(fit)
>
> colnames(fit)
> [1] "(Intercept)" "Alevel1" "Alevel2" "Alevel3" "Alevel4"        [6] "Alevel5" "Alevel6" "Alevel7" "Alevel8" "Alevel9"
> [11]"Alevel10" "Alevel11" "Alevel12" "Alevel13" "Alevel14"
> [16] "Alevel15" "Blevel1" "Blevel2" "Sexmale" "Alevel1:Blevel1"
>
> [21] "Alevel2:Blevel1" "Alevel3:Blevel1" "Alevel4:Blevel1" "Alevel5:Blevel1" "Alevel6:Blevel1"
>
> [26] "Alevel7:Blevel1" "Alevel8:Blevel1" "Alevel9:Blevel1" "Alevel10:Blevel1" "Alevel11:Blevel1"
>
> [31] "Alevel12:Blevel1" "Alevel13:Blevel1" "Alevel14:Blevel1" "Alevel15:Blevel1" "Alevel1:Blevel2"
>
> [36] "Alevel2:Blevel2"  "Alevel3:Blevel2" "Alevel4:Blevel2" "Alevel5:Blevel2" "Alevel6:Blevel2"
>
> [41] "Alevel7:Blevel2"  "Alevel8:Blevel2" "Alevel9:Blevel2"   "Alevel10:Blevel2" "Alevel11:Blevel2"
>
> [46] "Alevel12:Blevel2" "Alevel13:Blevel2" "Alevel14:Blevel2" "Alevel15:Blevel2" "Alevel1:Sexmale"
>
> [51] "Alevel2:Sexmale" "Alevel3:Sexmale" "Alevel4:Sexmale" "Alevel5:Sexmale" "Alevel6:Sexmale"
>
> [56] "Alevel7:Sexmale" "Alevel8:Sexmale" "Alevel9:Sexmale" "Alevel10:Sexmale" "Alevel11:Sexmale"
>
> [61] "Alevel12:Sexmale" "Alevel13:Sexmale" "Alevel14:Sexmale" "Alevel15:Sexmale" "Blevel1:Sexmale"
>
> [66] "Blevel2:Sexmale" "Alevel1:Blevel1:Sexmale" "Alevel2:Blevel1:Sexmale" "Alevel3:Blevel1:Sexmale" "Alevel4:Blevel1:Sexmale"
>
> [71] "Alevel5:Blevel1:Sexmale" "Alevel6:Blevel1:Sexmale" "Alevel7:Blevel1:Sexmale" "Alevel8:Blevel1:Sexmale" "Alevel9:Blevel1:Sexmale"
>
> [76] "Alevel10:Blevel1:Sexmale" "Alevel11:Blevel1:Sexmale" "Alevel12:Blevel1:Sexmale" "Alevel13:Blevel1:Sexmale" "Alevel14:Blevel1:Sexmale"
>
> [81] "Alevel15:Blevel1:Sexmale" "Alevel1:Blevel2:Sexmale" "Alevel2:Blevel2:Sexmale" "Alevel3:Blevel2:Sexmale" "Alevel4:Blevel2:Sexmale"
>
> [86] "Alevel5:Blevel2:Sexmale" "Alevel6:Blevel2:Sexmale" "Alevel7:Blevel2:Sexmale" "Alevel8:Blevel2:Sexmale" "Alevel9:Blevel2:Sexmale"
>
> [91] "Alevel10:Blevel2:Sexmale" "Alevel11:Blevel2:Sexmale" "Alevel12:Blevel2:Sexmale" "Alevel13:Blevel2:Sexmale" "Alevel14:Blevel2:Sexmale"
> [96] "Alevel15:Blevel2:Sexmale"
>
> (1) To test the main effect, I use the following code:
> lrt_A<-glmLRT(fit,coef=c(2:16))
> lrt_B<-glmLRT(fit,coef=c(17:18))
> lrt_Sex<-glmLRT(fit,coef=19)
>
> (2) To test the two-way interaction terms, I use the following code:
> lrt_AB<-glmLRT(fit,coef=c(20:49))
>
> lrt_ASex<-glmLRT(fit,coef=c(50:64))
>
> lrt_BSex<-glmLRT(fit,coef=c(65:66))
>
> (3) To test the three-way interaction terms, I use the following code:
> lrt_ABSex<-glmLRT(fit,coef=c(67:96))
>
> My question: is my code correct for each of the testing above?
>
>
>
> Thank you!
>
>
>
>
>
>
>
>
>
>
> -- output of sessionInfo():
>
>>  sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_3.2.4  limma_3.16.8
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.1
>
>
> --
> Sent via the guest posting facility at bioconductor.org.

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list