[BioC] EdgeR multi-factor testing questions
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Jan 7 23:57:04 CET 2014
Dear Yanzhu,
Your analysis is fine from a code point of view. From a statistical point
of view however your analysis is too simple because you are neglecting the
principle of marginality:
http://en.wikipedia.org/wiki/Principle_of_marginality
For the model you have fitted, it makes sense to test for the three-way
interaction as you do. However it does not make statistical sense to test
for the main effects or two-interactions until you have established that
the three-way interaction is non-significant.
For count data, the tests for the lower-level interactions need to be
computed by successively removing each level of interactions from the
model. See for example:
https://stat.ethz.ch/pipermail/bioconductor/2013-December/056584.html
This is the same as the anova() function does in R for unbalanced linear
factorial models.
Furthermore, testing the two-way interations is only sensible for genes
with non-signicant 3-way interactions. Similarly, testing the main effect
is only sensible for genes with non-significant 2-way and 3-way
interactions. Otherwise these tests have no useful scientific meaning.
This is a basic drawback of the factorial anova approach. You might
consider the alternative approach described in Section 3.3.1 of the edgeR
User's Guide.
Best wishes
Gordon
> Date: Mon, 6 Jan 2014 11:21:37 -0800 (PST)
> From: "Yanzhu [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, mlinyzh at gmail.com
> Subject: [BioC] EdgeR multi-factor testing questions
>
>
> I'm currently using EdgeR to analyze RNASeq data. I have one question of
> whether I have chosen the correct method for analyzing my
> multi-factorial experiment.
> I have 3 factors: factor A with 16 levels (from level0 to level15),
> factor B with three levels (from level0 to leve2), and factor Sex with
> two levels (male and female). I'm interested in testing all main
> effects: A, B and Sex, all two-way interaction terms: A:B, A:Sex and
> B:Sex; and the three-way interaction term: A:B:Sex.
>
> My code:
>
> group<-paste(A,B,Sex,sep=".")
> design<-model.matrix(~A+B+Sex+A:B+A:Sex+B:Sex+A:B:Sex)
> y<-DGEList(counts=readcounts,group=group)
> y<-calcNormFactors(y,method="TMM")
>
>
> y<-estimateGLMCommonDisp(y,design)
> y<-estimateGLMTagwiseDisp(y,design)
>
>
> fit<-glmFit(y,design)
> colnames(fit)
>
> colnames(fit)
> [1] "(Intercept)" "Alevel1" "Alevel2" "Alevel3" "Alevel4" [6] "Alevel5" "Alevel6" "Alevel7" "Alevel8" "Alevel9"
> [11]"Alevel10" "Alevel11" "Alevel12" "Alevel13" "Alevel14"
> [16] "Alevel15" "Blevel1" "Blevel2" "Sexmale" "Alevel1:Blevel1"
>
> [21] "Alevel2:Blevel1" "Alevel3:Blevel1" "Alevel4:Blevel1" "Alevel5:Blevel1" "Alevel6:Blevel1"
>
> [26] "Alevel7:Blevel1" "Alevel8:Blevel1" "Alevel9:Blevel1" "Alevel10:Blevel1" "Alevel11:Blevel1"
>
> [31] "Alevel12:Blevel1" "Alevel13:Blevel1" "Alevel14:Blevel1" "Alevel15:Blevel1" "Alevel1:Blevel2"
>
> [36] "Alevel2:Blevel2" "Alevel3:Blevel2" "Alevel4:Blevel2" "Alevel5:Blevel2" "Alevel6:Blevel2"
>
> [41] "Alevel7:Blevel2" "Alevel8:Blevel2" "Alevel9:Blevel2" "Alevel10:Blevel2" "Alevel11:Blevel2"
>
> [46] "Alevel12:Blevel2" "Alevel13:Blevel2" "Alevel14:Blevel2" "Alevel15:Blevel2" "Alevel1:Sexmale"
>
> [51] "Alevel2:Sexmale" "Alevel3:Sexmale" "Alevel4:Sexmale" "Alevel5:Sexmale" "Alevel6:Sexmale"
>
> [56] "Alevel7:Sexmale" "Alevel8:Sexmale" "Alevel9:Sexmale" "Alevel10:Sexmale" "Alevel11:Sexmale"
>
> [61] "Alevel12:Sexmale" "Alevel13:Sexmale" "Alevel14:Sexmale" "Alevel15:Sexmale" "Blevel1:Sexmale"
>
> [66] "Blevel2:Sexmale" "Alevel1:Blevel1:Sexmale" "Alevel2:Blevel1:Sexmale" "Alevel3:Blevel1:Sexmale" "Alevel4:Blevel1:Sexmale"
>
> [71] "Alevel5:Blevel1:Sexmale" "Alevel6:Blevel1:Sexmale" "Alevel7:Blevel1:Sexmale" "Alevel8:Blevel1:Sexmale" "Alevel9:Blevel1:Sexmale"
>
> [76] "Alevel10:Blevel1:Sexmale" "Alevel11:Blevel1:Sexmale" "Alevel12:Blevel1:Sexmale" "Alevel13:Blevel1:Sexmale" "Alevel14:Blevel1:Sexmale"
>
> [81] "Alevel15:Blevel1:Sexmale" "Alevel1:Blevel2:Sexmale" "Alevel2:Blevel2:Sexmale" "Alevel3:Blevel2:Sexmale" "Alevel4:Blevel2:Sexmale"
>
> [86] "Alevel5:Blevel2:Sexmale" "Alevel6:Blevel2:Sexmale" "Alevel7:Blevel2:Sexmale" "Alevel8:Blevel2:Sexmale" "Alevel9:Blevel2:Sexmale"
>
> [91] "Alevel10:Blevel2:Sexmale" "Alevel11:Blevel2:Sexmale" "Alevel12:Blevel2:Sexmale" "Alevel13:Blevel2:Sexmale" "Alevel14:Blevel2:Sexmale"
> [96] "Alevel15:Blevel2:Sexmale"
>
> (1) To test the main effect, I use the following code:
> lrt_A<-glmLRT(fit,coef=c(2:16))
> lrt_B<-glmLRT(fit,coef=c(17:18))
> lrt_Sex<-glmLRT(fit,coef=19)
>
> (2) To test the two-way interaction terms, I use the following code:
> lrt_AB<-glmLRT(fit,coef=c(20:49))
>
> lrt_ASex<-glmLRT(fit,coef=c(50:64))
>
> lrt_BSex<-glmLRT(fit,coef=c(65:66))
>
> (3) To test the three-way interaction terms, I use the following code:
> lrt_ABSex<-glmLRT(fit,coef=c(67:96))
>
> My question: is my code correct for each of the testing above?
>
>
>
> Thank you!
>
>
>
>
>
>
>
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.2.4 limma_3.16.8
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.1
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list