[BioC] edgeR glmFit -- how to define and test reduced, additive and interaction fitted models
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Dec 18 23:41:15 CET 2011
Dear Miguel,
You want to do two tests. You want to test for Adapation:Temperature
interaction (fit2 vs fit1), and you want to test for Temperature adjusted
for Adaption (fit1 vs fit0). In edgeR, you conduct these test by fitting
the alternative model, the specifying the term to be removed.
design2 <- model.matrix(~Adaption*Temperature)
design1 <- model.matrix(~Adaption+Temperature)
To do genewise tests for the interaction:
fit2 <- glmFit(d.GLM,design2)
lrt2 <- glmLRT(d.GLM,fit2)
topTags(lrt2)
To test for Temperature adjusted for Adaption:
fit1 <- glmFit(d.GLM,design1)
lrt1 <- glmLRT(d.GLM,fit1)
topTags(lrt1)
In both cases, the last term in the model is the one removed unless you
specify otherwise. You do not have to explicitly fit reduced models,
because edgeR does this for you automatically.
Best wishes
Gordon
> Date: Fri, 16 Dec 2011 14:48:39 +0100
> From: Miguel Gallach <miguel.gallach at vetmeduni.ac.at>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR glmFit -- how to define and test reduced,
> additive and interaction fitted models
>
> Dear list,
>
> I am using edgeR to analyze my RNA-Seq data. My data consist in two factors
> (Adaptation and Temperature) and two levels per factor, and I want fit full
> and reduced glm like the next ones:
>
> Reduced model
> fit0 = counts ~ Adaptation
>
> Aditive model
> fit1 = counts ~ Adaptation + Temperature
>
> Interaction model
> fit2 = counts ~ block + treatment + block:treatment
>
>
> That is what I did until now:
> d = raw.data[,1:8]
> head (d)
> R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold
> FBgn0000003 0 0 0 0 2 8 3 1
> FBgn0000008 92 123 80 41 76 135 188 150
> FBgn0000014 0 1 2 4 5 8 10 7
> FBgn0000015 2 1 0 0 2 3 9 5
> FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665
> FBgn0000018 36 43 10 17 6 16 20 30
>
>
> Adaptation =
> factor(c("HotAdapted","HotAdapted","ColdAdapted","ColdAdapted","HotAdapted","HotAdapted","ColdAdapted","ColdAdapted"))
> Temperature =
> factor(c("HotCage","HotCage","HotCage","HotCage","ColdCage","ColdCage","ColdCage","ColdCage"))
> design = model.matrix(~Adaptation + Temperature + Adaptation:Temperature)
> design
> (Intercept) AdaptationHotAdapted TemperatureHot
> 1 1 1 1
> 2 1 1 1
> 3 1 0 1
> 4 1 0 1
> 5 1 1 0
> 6 1 1 0
> 7 1 0 0
> 8 1 0 0
> AdaptationHotAdapted:TemperatureHot
> 1 1
> 2 1
> 3 0
> 4 0
> 5 0
> 6 0
> 7 0
> 8 0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Adaptation
> [1] "contr.treatment"
>
> attr(,"contrasts")$Temperature
> [1] "contr.treatment"
>
>
> d.GLM = DGEList(d, group = c("HotAdaptedHotCage", "HotAdaptedHotCage",
> "ColdAdaptedHotCage", "ColdAdaptedHotCage","HotAdaptedColdCage",
> "HotAdaptedColdCage", "ColdAdaptedColdCage", "ColdAdaptedColdCage"))
> Calculating library sizes from column totals.
> d.GLM = calcNormFactors(d.GLM)
> d.GLM
> An object of class "DGEList"
> $samples
> group lib.size norm.factors
> R4.Hot HotAdaptedHot 17409289 0.9881635
> R5.Hot HotAdaptedHot 17642552 1.0818144
> R9.Hot ColdAdaptedHot 20010974 0.8621807
> R10.Hot ColdAdaptedHot 14064143 0.8932791
> R4.Cold HotAdaptedCold 11968317 1.0061084
> R5.Cold HotAdaptedCold 11072832 1.0523857
> R9.Cold ColdAdaptedCold 22386103 1.0520949
> R10.Cold ColdAdaptedCold 17408532 1.0903311
>
> $counts
> R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold
> FBgn0000003 0 0 0 0 2 8 3 1
> FBgn0000008 92 123 80 41 76 135 188 150
> FBgn0000014 0 1 2 4 5 8 10 7
> FBgn0000015 2 1 0 0 2 3 9 5
> FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665
> 14864 more rows ...
>
> $all.zeros
> FBgn0000003 FBgn0000008 FBgn0000014 FBgn0000015 FBgn0000017
> FALSE FALSE FALSE FALSE FALSE
> 14864 more elements ...
>
>
> #According to the manual, DGEList is now ready to be passed to the
> functions that do the calculations to determine differential expression
> levels for the genes. However is not possible to achieve statistical
> significance with fewer than ten counts in total for a tag/gene and we also
> do not want to waste effort finding spurious DE (such as when a gene is
> only expressed in one library), so we filter out tags/genes with fewer than
> 1 count per million in SIX or more libraries.
>
> cpm.d.GLM = cpm(d.GLM)
> d.GLM = d.GLM[rowSums(cpm.d.GLM > 1) >= 2, ]
> nrow(d.GLM)
> [1] 9418
>
>
> #Now the dataset is ready to be analysed for differential expression
> #The design matrix
> design
> (Intercept) AdaptationHotAdapted TemperatureHotCage
> 1 1 1 1
> 2 1 1 1
> 3 1 0 1
> 4 1 0 1
> 5 1 1 0
> 6 1 1 0
> 7 1 0 0
> 8 1 0 0
> AdaptationHotAdapted:TemperatureHotCage
> 1 1
> 2 1
> 3 0
> 4 0
> 5 0
> 6 0
> 7 0
> 8 0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Adaptation
> [1] "contr.treatment"
>
> attr(,"contrasts")$Temperature
> [1] "contr.treatment"
>
>
> #Estimate CR common/trended/tagwise dispersion
> d.GLM = estimateGLMCommonDisp(d.GLM, design)
> names(d.GLM)
> [1] "samples" "counts" "all.zeros"
> [4] "common.dispersion"
>
> d.GLM$common.dispersion
> [1] 0.04634741
>
> sqrt(d.GLM$common.dispersion)
> [1] 0.2152845
>
> d.GLM = estimateGLMTrendedDisp(d.GLM, design)
> Loading required package: splines
>
> summary(d.GLM$trended.dispersion)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 0.03672 0.03843 0.04326 0.05267 0.06134 0.11870
>
> d.GLM = estimateGLMTagwiseDisp(d.GLM, design)
>
> d.GLM$prior.n
> NULL
>
> d$prior.n
> NULL
>
> ls()
> [1] "Adaptation" "cpm.d" "cpm.d.GLM" "d" "d.GLM"
> [6] "design" "raw.data" "Temperature"
>
>
> d.GLM
> An object of class "DGEList"
> $samples
> group lib.size norm.factors
> R4.Hot HotAdaptedHotCage 17409289 0.9881635
> R5.Hot HotAdaptedHotCage 17642552 1.0818144
> R9.Hot ColdAdaptedHotCage 20010974 0.8621807
> R10.Hot ColdAdaptedHotCage 14064143 0.8932791
> R4.Cold HotAdaptedColdCage 11968317 1.0061084
> R5.Cold HotAdaptedColdCage 11072832 1.0523857
> R9.Cold ColdAdaptedColdCage 22386103 1.0520949
> R10.Cold ColdAdaptedColdCage 17408532 1.0903311
>
> $counts
> R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold
> FBgn0000008 92 123 80 41 76 135 188 150
> FBgn0000017 2000 2393 1799 1290 1029 973 2441 1665
> FBgn0000018 36 43 10 17 6 16 20 30
> FBgn0000024 51 68 73 42 118 118 242 129
> FBgn0000032 1640 1942 2328 1342 755 674 2110 1307
> 9413 more rows ...
>
> $all.zeros
> FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
> FALSE FALSE FALSE FALSE FALSE
> 9413 more elements ...
>
> $common.dispersion
> [1] 0.04634741
>
> $trended.dispersion
> [1] 0.06147752 0.03682037 0.09393240 0.06250704 0.03701067
> 9413 more elements ...
>
> $abundance
> FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
> -11.915246 -9.183744 -13.519050 -11.966242 -9.300038
> 9413 more elements ...
>
> $bin.dispersion
> [1] 0.10242476 0.08298402 0.07844827 0.06269934 0.05173652 0.04872388
> [7] 0.04284990 0.04557811 0.04009431 0.03637836 0.03769710 0.03569138
> [13] 0.03724316 0.03980808 0.04726670
>
> $bin.abundance
> [1] -13.904034 -13.147003 -12.527444 -12.012132 -11.522218 -11.127957
> [7] -10.769579 -10.430591 -10.119638 -9.821485 -9.532052 -9.228034
> [13] -8.853866 -8.376654 -7.322181
>
> $tagwise.dispersion
> [1] 0.05760539 0.03015829 0.10190732 0.05315469 0.03199193
> 9413 more elements ...
>
> getPriorN(d.GLM, design=design)
> [1] 5
>
> #Fit model with tagwise dispersion
> glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion)
> lrt.tgw = glmLRT(d.GLM, glmfit.tgw)
> topTags(lrt.tgw)
> Coefficient: AdaptationHotAdapted:TemperatureHotCage
> logConc logFC LR P.Value FDR
> FBgn0032285 -8.215481 -5.121465 60.90457 5.990958e-15 5.642284e-11
> FBgn0026089 -8.406297 -3.019672 40.81214 1.675890e-10 7.891766e-07
> FBgn0039475 -10.039331 -2.757261 39.04361 4.144435e-10 1.301076e-06
> FBgn0051641 -10.291549 2.976952 37.46208 9.320768e-10 2.194575e-06
> FBgn0058042 -9.665534 2.267317 33.71276 6.388032e-09 1.203250e-05
> FBgn0010333 -8.364017 -2.431286 31.28393 2.229167e-08 3.163945e-05
> FBgn0030094 -9.930521 -4.244210 31.18012 2.351626e-08 3.163945e-05
> FBgn0053653 -11.223515 -2.506126 30.73483 2.958070e-08 3.482388e-05
> FBgn0028863 -9.954041 -2.419844 30.13375 4.032520e-08 4.219808e-05
> FBgn0037979 -11.777803 -3.211380 29.61620 5.266308e-08 4.959809e-05
>
>
> If I understand, glmfit.tgw would be the interaction model (f2) I was
> talking about, wright? My problem is that I do not know now how to define
> the additive and reduced models to test Additive vs. Reduced and
> Interaction vs. Additive.
>
> Any suggestion?
>
> Many thanks,
>
> Miguel Gallach
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list