[BioC] edgeR glmFit -- how to define and test reduced, additive and interaction fitted models

Gordon K Smyth smyth at wehi.EDU.AU
Sun Dec 18 23:41:15 CET 2011


Dear Miguel,

You want to do two tests.  You want to test for Adapation:Temperature 
interaction (fit2 vs fit1), and you want to test for Temperature adjusted 
for Adaption (fit1 vs fit0).  In edgeR, you conduct these test by fitting 
the alternative model, the specifying the term to be removed.

   design2 <- model.matrix(~Adaption*Temperature)
   design1 <- model.matrix(~Adaption+Temperature)

To do genewise tests for the interaction:

   fit2 <- glmFit(d.GLM,design2)
   lrt2 <- glmLRT(d.GLM,fit2)
   topTags(lrt2)

To test for Temperature adjusted for Adaption:

   fit1 <- glmFit(d.GLM,design1)
   lrt1 <- glmLRT(d.GLM,fit1)
   topTags(lrt1)

In both cases, the last term in the model is the one removed unless you 
specify otherwise.  You do not have to explicitly fit reduced models, 
because edgeR does this for you automatically.

Best wishes
Gordon

> Date: Fri, 16 Dec 2011 14:48:39 +0100
> From: Miguel Gallach <miguel.gallach at vetmeduni.ac.at>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR glmFit -- how to define and test reduced,
> 	additive and interaction fitted models
>
> Dear list,
>
> I am using edgeR to analyze my RNA-Seq data. My data consist in two factors
> (Adaptation and Temperature) and two levels per factor, and I want fit full
> and reduced glm like the next ones:
>
> Reduced model
> fit0 = counts ~ Adaptation
>
> Aditive model
> fit1 = counts ~ Adaptation + Temperature
>
> Interaction model
> fit2 = counts ~ block + treatment + block:treatment
>
>
> That is what I did until now:
> d = raw.data[,1:8]
> head (d)
>            R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold
> FBgn0000003      0      0      0       0       2       8       3        1
> FBgn0000008     92    123     80      41      76     135     188      150
> FBgn0000014      0      1      2       4       5       8      10        7
> FBgn0000015      2      1      0       0       2       3       9        5
> FBgn0000017   2000   2393   1799    1290    1029     973    2441     1665
> FBgn0000018     36     43     10      17       6      16      20       30
>
>
> Adaptation =
> factor(c("HotAdapted","HotAdapted","ColdAdapted","ColdAdapted","HotAdapted","HotAdapted","ColdAdapted","ColdAdapted"))
> Temperature =
> factor(c("HotCage","HotCage","HotCage","HotCage","ColdCage","ColdCage","ColdCage","ColdCage"))
> design = model.matrix(~Adaptation + Temperature + Adaptation:Temperature)
> design
>  (Intercept) AdaptationHotAdapted TemperatureHot
> 1           1                    1              1
> 2           1                    1              1
> 3           1                    0              1
> 4           1                    0              1
> 5           1                    1              0
> 6           1                    1              0
> 7           1                    0              0
> 8           1                    0              0
>  AdaptationHotAdapted:TemperatureHot
> 1                                   1
> 2                                   1
> 3                                   0
> 4                                   0
> 5                                   0
> 6                                   0
> 7                                   0
> 8                                   0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Adaptation
> [1] "contr.treatment"
>
> attr(,"contrasts")$Temperature
> [1] "contr.treatment"
>
>
> d.GLM = DGEList(d, group = c("HotAdaptedHotCage", "HotAdaptedHotCage",
> "ColdAdaptedHotCage", "ColdAdaptedHotCage","HotAdaptedColdCage",
> "HotAdaptedColdCage", "ColdAdaptedColdCage", "ColdAdaptedColdCage"))
> Calculating library sizes from column totals.
> d.GLM = calcNormFactors(d.GLM)
> d.GLM
> An object of class "DGEList"
> $samples
>                   group lib.size norm.factors
> R4.Hot     HotAdaptedHot 17409289    0.9881635
> R5.Hot     HotAdaptedHot 17642552    1.0818144
> R9.Hot    ColdAdaptedHot 20010974    0.8621807
> R10.Hot   ColdAdaptedHot 14064143    0.8932791
> R4.Cold   HotAdaptedCold 11968317    1.0061084
> R5.Cold   HotAdaptedCold 11072832    1.0523857
> R9.Cold  ColdAdaptedCold 22386103    1.0520949
> R10.Cold ColdAdaptedCold 17408532    1.0903311
>
> $counts
>            R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold
> FBgn0000003      0      0      0       0       2       8       3        1
> FBgn0000008     92    123     80      41      76     135     188      150
> FBgn0000014      0      1      2       4       5       8      10        7
> FBgn0000015      2      1      0       0       2       3       9        5
> FBgn0000017   2000   2393   1799    1290    1029     973    2441     1665
> 14864 more rows ...
>
> $all.zeros
> FBgn0000003 FBgn0000008 FBgn0000014 FBgn0000015 FBgn0000017
>      FALSE       FALSE       FALSE       FALSE       FALSE
> 14864 more elements ...
>
>
> #According to the manual, DGEList is now ready to be passed to the
> functions that do the calculations to determine differential expression
> levels for the genes. However is not possible to achieve statistical
> significance with fewer than ten counts in total for a tag/gene and we also
> do not want to waste effort finding spurious DE (such as when a gene is
> only expressed in one library), so we filter out tags/genes with fewer than
> 1 count per million in SIX or more libraries.
>
> cpm.d.GLM = cpm(d.GLM)
> d.GLM = d.GLM[rowSums(cpm.d.GLM > 1) >= 2, ]
> nrow(d.GLM)
> [1] 9418
>
>
> #Now the dataset is ready to be analysed for differential expression
> #The design matrix
> design
>  (Intercept) AdaptationHotAdapted TemperatureHotCage
> 1           1                    1                  1
> 2           1                    1                  1
> 3           1                    0                  1
> 4           1                    0                  1
> 5           1                    1                  0
> 6           1                    1                  0
> 7           1                    0                  0
> 8           1                    0                  0
>  AdaptationHotAdapted:TemperatureHotCage
> 1                                       1
> 2                                       1
> 3                                       0
> 4                                       0
> 5                                       0
> 6                                       0
> 7                                       0
> 8                                       0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Adaptation
> [1] "contr.treatment"
>
> attr(,"contrasts")$Temperature
> [1] "contr.treatment"
>
>
> #Estimate CR common/trended/tagwise dispersion
> d.GLM = estimateGLMCommonDisp(d.GLM, design)
> names(d.GLM)
> [1] "samples"           "counts"            "all.zeros"
> [4] "common.dispersion"
>
> d.GLM$common.dispersion
> [1] 0.04634741
>
> sqrt(d.GLM$common.dispersion)
> [1] 0.2152845
>
> d.GLM = estimateGLMTrendedDisp(d.GLM, design)
> Loading required package: splines
>
> summary(d.GLM$trended.dispersion)
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> 0.03672 0.03843 0.04326 0.05267 0.06134 0.11870
>
> d.GLM = estimateGLMTagwiseDisp(d.GLM, design)
>
> d.GLM$prior.n
> NULL
>
> d$prior.n
> NULL
>
> ls()
> [1] "Adaptation"  "cpm.d"       "cpm.d.GLM"   "d"           "d.GLM"
> [6] "design"      "raw.data"    "Temperature"
>
>
> d.GLM
> An object of class "DGEList"
> $samples
>                       group lib.size norm.factors
> R4.Hot     HotAdaptedHotCage 17409289    0.9881635
> R5.Hot     HotAdaptedHotCage 17642552    1.0818144
> R9.Hot    ColdAdaptedHotCage 20010974    0.8621807
> R10.Hot   ColdAdaptedHotCage 14064143    0.8932791
> R4.Cold   HotAdaptedColdCage 11968317    1.0061084
> R5.Cold   HotAdaptedColdCage 11072832    1.0523857
> R9.Cold  ColdAdaptedColdCage 22386103    1.0520949
> R10.Cold ColdAdaptedColdCage 17408532    1.0903311
>
> $counts
>            R4.Hot R5.Hot R9.Hot R10.Hot R4.Cold R5.Cold R9.Cold R10.Cold
> FBgn0000008     92    123     80      41      76     135     188      150
> FBgn0000017   2000   2393   1799    1290    1029     973    2441     1665
> FBgn0000018     36     43     10      17       6      16      20       30
> FBgn0000024     51     68     73      42     118     118     242      129
> FBgn0000032   1640   1942   2328    1342     755     674    2110     1307
> 9413 more rows ...
>
> $all.zeros
> FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
>      FALSE       FALSE       FALSE       FALSE       FALSE
> 9413 more elements ...
>
> $common.dispersion
> [1] 0.04634741
>
> $trended.dispersion
> [1] 0.06147752 0.03682037 0.09393240 0.06250704 0.03701067
> 9413 more elements ...
>
> $abundance
> FBgn0000008 FBgn0000017 FBgn0000018 FBgn0000024 FBgn0000032
> -11.915246   -9.183744  -13.519050  -11.966242   -9.300038
> 9413 more elements ...
>
> $bin.dispersion
> [1] 0.10242476 0.08298402 0.07844827 0.06269934 0.05173652 0.04872388
> [7] 0.04284990 0.04557811 0.04009431 0.03637836 0.03769710 0.03569138
> [13] 0.03724316 0.03980808 0.04726670
>
> $bin.abundance
> [1] -13.904034 -13.147003 -12.527444 -12.012132 -11.522218 -11.127957
> [7] -10.769579 -10.430591 -10.119638  -9.821485  -9.532052  -9.228034
> [13]  -8.853866  -8.376654  -7.322181
>
> $tagwise.dispersion
> [1] 0.05760539 0.03015829 0.10190732 0.05315469 0.03199193
> 9413 more elements ...
>
> getPriorN(d.GLM, design=design)
> [1] 5
>
> #Fit model with tagwise dispersion
> glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion)
> lrt.tgw = glmLRT(d.GLM, glmfit.tgw)
> topTags(lrt.tgw)
> Coefficient:  AdaptationHotAdapted:TemperatureHotCage
>               logConc     logFC       LR      P.Value          FDR
> FBgn0032285  -8.215481 -5.121465 60.90457 5.990958e-15 5.642284e-11
> FBgn0026089  -8.406297 -3.019672 40.81214 1.675890e-10 7.891766e-07
> FBgn0039475 -10.039331 -2.757261 39.04361 4.144435e-10 1.301076e-06
> FBgn0051641 -10.291549  2.976952 37.46208 9.320768e-10 2.194575e-06
> FBgn0058042  -9.665534  2.267317 33.71276 6.388032e-09 1.203250e-05
> FBgn0010333  -8.364017 -2.431286 31.28393 2.229167e-08 3.163945e-05
> FBgn0030094  -9.930521 -4.244210 31.18012 2.351626e-08 3.163945e-05
> FBgn0053653 -11.223515 -2.506126 30.73483 2.958070e-08 3.482388e-05
> FBgn0028863  -9.954041 -2.419844 30.13375 4.032520e-08 4.219808e-05
> FBgn0037979 -11.777803 -3.211380 29.61620 5.266308e-08 4.959809e-05
>
>
> If I understand, glmfit.tgw would be the interaction model (f2) I was
> talking about, wright? My problem is that I do not know now how to define
> the additive and reduced models to test Additive vs. Reduced and
> Interaction vs. Additive.
>
> Any suggestion?
>
> Many thanks,
>
> Miguel Gallach

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list