[BioC] creating contrast matrix (limma) for two factorial in R

Fri Mar 14 00:31:21 CET 2014

Hi Ryan,

Your design matrix is fine. However, your mistake is trying to stuff 
each one of your tests into a single contrast. As an example, for the 
interaction test, you are testing the interaction between a 5-level 
factor and a 2-level factor, so the test has 4 degrees of freedom. This 
means that you'll need 4 contrasts to represent this test:

interaction.contrast.exprs <- sprintf("(g%s.s-g%s.c) - (g1.s-g1.c)", 
2:5, 2:5)
interaction.contrast.matrix <- 
makeContrasts(contrasts=interaction.contrast.exprs, levels=design)

The resulting contrast matrix should have 4 columns, one for each 
contrast in your test.

For your other two tests, you should be aware that the expression 
"(g1.s + g1.c)/2" means "an equal mix of control and stressed", which 
is probably not what you want. I would say that it makes more sense to 
just compare the genotypes within a single condition. To test for 
genotype differences in the control condition, this is again a 
4-degree-of-freedom test, so you'll need a similar approach to the 
above:

control.genotype.contrast.exprs <- sprintf("g%s.c-g1.c", 2:5, 2:5)
control.genotype.contrast.matrix <- 
makeContrasts(contrasts=control.genotype.contrast.exprs, levels=design)

You could also to the same with the stressed condition as a separate 
test. The same caveat applies to your treatment contrast, which is only 
correct if you want to know the expression differences for control vs 
stressed in an equal mix of all 5 genotypes (i.e. a hypothetical 
population with 20% of each genotype). Again, the best approach is 
probably to test control vs stressed in each genotype separately, i.e. 
5 tests of one contrast each.

Lastly, since you said you are working with spectral count data, you 
might want to consider using the edgeR or DESeq2 packages, which are 
specifically designed for analyzing count data, or else to use limma's 
voom function to perform the log2-transform while accounting for 
heteroskedasticity.

Anyway, that's my understanding. I hope it helps you. Anyone else can 
feel free to correct me if I've made any errors in my explanation.

-Ryan Thompson

On Thu Mar 13 15:14:20 2014, Ryan M Ghan wrote:
> Hello limma community-
>
> I am attempting to construct a contrast matrix that I can run in R, using the limma bioconductor package, but I am not sure that I have coded the contrast matrix correctly. A previous post<https://stats.stackexchange.com/questions/64249/creating-contrast-matrix-for-linear-regression-in-r?newreg=add2674ca9d04b7eb85fad255b45b7f5> on stats.stackexchange.com, the bioconductor mailing list, and the limma guide were helpful, but my two factorial design is more complicated than what is illustrated there.
>
> The first factor is the treatment, with two levels (control=c and stress=s), and the second factor is the genotype, with five levels (g1, g2, g3, g4, g5). Each genotype/treatment consists of 3-biological replicates (30xsamples total). My dataset has already been normalized and log2 transformed (Should I even start with log transformed data?). It consists of 1208 proteins (based upon spectral counting for those that care) that measures protein abundance differences in the five genotypes and two treatments. The dataset is complete, meaning each sample/condition has a datapoint, and appears to be normally distributed (histogram and qqplots).
>
> ## Session information
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_3.18.13
>
> loaded via a namespace (and not attached):
>   [1] colorspace_1.2-4   dichromat_2.0-0    digest_0.6.4       ggplot2_0.9.3.1    grid_3.0.2         gtable_0.1.2
>   [7] labeling_0.2       MASS_7.3-30        munsell_0.4.2      plyr_1.8.1         proto_0.3-10       RColorBrewer_1.0-5
> [13] Rcpp_0.11.0        reshape2_1.2.2     scales_0.2.3       stringr_0.6.2      tools_3.0.2
>
> ## Subset of the data:
>      proteinID g1.s1 g1.s2 g1.s3 g1.c1 g1.c2 g1.c3 g2.s1 g2.s2 g2.s3 g2.c1 g2.c2 g2.c3 g3.s1 g3.s2 g3.s3 g3.c1 g3.c2 g3.c3 g4.s1 g4.s2 g4.s3 g4.c1 g4.c2 g4.c3 g5.s1 g5.s2 g5.s3 g5.c1 g5.c2 g5.c3
>      prot1 -9.70583694 -9.940059478 -9.764489183 -9.691937821 -9.547306096 -9.668928704 -9.821333234 -10.00376839 -9.843380585 -10.0789111 -9.958506961 -9.791583706 -10.04996359 -10.10279896 -10.0689715 -9.989303332 -10.05414639 -10.00619809 -9.907032795 -10.09700113 -10.00902876 -10.05603575 -10.26218387 -10.15527373 -9.88009858 -9.748974338 -9.730010667 -9.899956956 -9.773955101 -9.957684691
>      prot2 -9.810354967 -9.844319231 -9.896748977 -9.777040294 -9.821308434 -9.906798728 -9.832236541 -9.876359355 -9.935535795 -10.05991278 -9.831098077 -9.789738587 -10.08470861 -10.18515166 -10.10371621 -10.01971224 -9.977142493 -10.09055782 -9.739831978 -9.586647999 -9.949407778 -9.800183583 -9.83900565 -9.943521592 -9.99229056 -9.744850134 -9.794814509 -9.98542989 -9.766324886 -9.95430439
>      prot3 -11.70842601 -11.72521838 -11.90389475 -11.98273998 -11.915401 -11.88620205 -11.91603643 -11.96029519 -12.14926486 -12.23846499 -12.26650985 -11.84300821 -12.64562082 -12.41471031 -12.66462278 -12.577619 -12.90001898 -12.31577711 -11.66323243 -11.50283992 -11.4844068 -11.60402491 -11.95270942 -11.68245512 -12.32380181 -12.24294758 -12.23990879 -12.21563403 -12.33730369 -12.437377
>      prot4 -10.88942769 -11.16906693 -11.13942576 -11.31332257 -11.04718433 -11.11811122 -11.17687812 -11.12503828 -10.9724186 -11.16837945 -11.19642214 -10.96468249 -11.3975887 -11.28808753 -11.32778647 -11.34124725 -11.30972182 -11.29564372 -10.74370929 -10.92223539 -10.97733154 -11.40528844 -11.1238659 -11.15938598 -11.24937805 -10.8691392 -11.12478375 -10.75566728 -10.99485703 -11.09493115
>      prot5 -10.0102959 -9.936796529 -9.964629149 -9.842835973 -9.791578592 -9.773380518 -9.72290866 -9.715837804 -9.79028651 -9.951486129 -9.636225505 -9.820715987 -10.41899204 -10.25269382 -10.26949484 -10.02644184 -10.13120897 -10.20756299 -9.752087376 -9.687001368 -10.07111473 -9.815279198 -9.995624174 -9.993526894 -9.722360141 -9.551502595 -9.551929198 -9.724500546 -9.502769792 -9.65324573
>      prot6 -10.34051005 -10.27571947 -10.14968761 -10.17419023 -10.47812301 -10.11019796 -10.40447672 -10.15885481 -10.22900798 -10.26612428 -10.21920493 -10.17186677 -10.66125689 -10.95438025 -10.63751536 -10.65825783 -10.60857688 -10.78516027 -10.33890785 -10.49726978 -10.47100414 -10.64742463 -10.78932619 -10.5318634 -10.26494688 -9.975182247 -10.24870036 -10.2356165 -10.26689552 -10.13061368
>      prot7 -10.24930429 -10.37307132 -10.03573128 -10.29985129 -9.991216794 -10.05854902 -10.1958704 -10.30549818 -10.2078462 -10.28795766 -10.23314344 -10.23897922 -9.997472306 -10.27461285 -10.20805608 -10.06261332 -10.24876706 -10.12643737 -9.906088449 -10.07316322 -10.23545822 -10.30970717 -10.40745591 -10.36432166 -10.22423532 -10.25703553 -10.44925268 -9.902554721 -9.891163766 -10.0695915
>      prot8 -10.98782595 -10.84184533 -10.76496107 -10.68290092 -10.55763113 -10.91736394 -10.87505278 -10.76474268 -10.58319007 -10.87547281 -10.71948079 -10.95011831 -10.99753277 -11.061728 -10.8852958 -10.86371208 -10.96638746 -11.24112703 -10.46809937 -10.78446288 -10.71240489 -10.80931259 -10.6598091 -10.54801115 -10.70612733 -10.7339808 -10.8184854 -10.53370359 -10.47323989 -10.62675183
>      prot9 -8.83857166 -8.736344638 -8.743339515 -8.8152675 -8.743086044 -8.719612156 -8.898093257 -8.902781886 -9.071574958 -8.945970659 -8.862394746 -8.825061244 -8.82313363 -9.161452294 -8.905846232 -8.940119002 -9.024995852 -8.943721201 -8.768488159 -8.802155458 -8.721187011 -8.84850416 -8.931513624 -8.86743278 -8.856904592 -8.675257846 -8.900833162 -8.676117406 -8.758661701 -8.925717389
>      prot10 -10.65297508 -10.74532307 -10.65940071 -10.36671791 -10.50431649 -10.54915637 -11.07154003 -10.79884265 -10.97164196 -11.1201714 -11.14821342 -10.9254445 -10.92875918 -10.90806369 -10.77581175 -11.2324716 -11.31360896 -11.01070959 -11.04450945 -10.89694291 -10.76865867 -10.92983387 -11.07365287 -11.43888216 -11.14948441 -10.69611194 -10.85827316 -10.64470128 -10.79046792 -10.86048168
>
> ## Code that I am attempting to utilize:
>      proteins.mat <- as.matrix(proteins.df)
>      treat = c("g1.s","g1.c","g2.s","g2.c","g3.s","g3.c","g4.s","g4.c","g5.s","g5.c")
>      factors = gl(10,3,labels=treat)
>      design <- model.matrix(~0+factors)
>      colnames(design) <- treat
>
> ## Here is the design for my model:
>      > design
>         g1.s g1.c g2.s g2.c g3.s g3.c g4.s g4.c g5.s g5.c
>      1     1    0    0    0    0    0    0    0    0    0
>      2     1    0    0    0    0    0    0    0    0    0
>      3     1    0    0    0    0    0    0    0    0    0
>      4     0    1    0    0    0    0    0    0    0    0
>      5     0    1    0    0    0    0    0    0    0    0
>      6     0    1    0    0    0    0    0    0    0    0
>      7     0    0    1    0    0    0    0    0    0    0
>      8     0    0    1    0    0    0    0    0    0    0
>      9     0    0    1    0    0    0    0    0    0    0
>      10    0    0    0    1    0    0    0    0    0    0
>      11    0    0    0    1    0    0    0    0    0    0
>      12    0    0    0    1    0    0    0    0    0    0
>      13    0    0    0    0    1    0    0    0    0    0
>      14    0    0    0    0    1    0    0    0    0    0
>      15    0    0    0    0    1    0    0    0    0    0
>      16    0    0    0    0    0    1    0    0    0    0
>      17    0    0    0    0    0    1    0    0    0    0
>      18    0    0    0    0    0    1    0    0    0    0
>      19    0    0    0    0    0    0    1    0    0    0
>      20    0    0    0    0    0    0    1    0    0    0
>      21    0    0    0    0    0    0    1    0    0    0
>      22    0    0    0    0    0    0    0    1    0    0
>      23    0    0    0    0    0    0    0    1    0    0
>      24    0    0    0    0    0    0    0    1    0    0
>      25    0    0    0    0    0    0    0    0    1    0
>      26    0    0    0    0    0    0    0    0    1    0
>      27    0    0    0    0    0    0    0    0    1    0
>      28    0    0    0    0    0    0    0    0    0    1
>      29    0    0    0    0    0    0    0    0    0    1
>      30    0    0    0    0    0    0    0    0    0    1
>      attr(,"assign")
>      [1] 1 1 1 1 1 1 1 1 1 1
>      attr(,"contrasts")
>      attr(,"contrasts")$factors
>      [1] "contr.treatment"
>
> ## My contrast model. I want to test for interaction, differences between genotypes, and to see if specific genotypes respond differently to the treatment from one another:
>      cmtx <- makeContrasts(
>        GenotypevsTreatment=(g1.s-g1.c)-(g2.s-g2.c)-(g3.s-g3.c)-(g4.s-g4.c)-(g5.s-g5.c),
>        genotype=(g1.s+g1.c)-(g2.s+g2.c)-(g3.s+g3.c)-(g4.s+g4.c)-(g5.s+g5.c),
>        Treatment=(g1.s+g2.s+g3.s+g4.s+g5.s)-(g1.c+g2.c+g3.c+g4.c+g5.c),
>        levels=design)
>
> ## What my contrast model looks like, but I don't think this is correct:
>      > cmtx
>            Contrasts
>      Levels GenotypevsTreatment Genotype Treatment
>        g1.s                   1        1         1
>        g1.c                  -1        1        -1
>        g2.s                  -1       -1         1
>        g2.c                   1       -1        -1
>        g3.s                  -1       -1         1
>        g3.c                   1       -1        -1
>        g4.s                  -1       -1         1
>        g4.c                   1       -1        -1
>        g5.s                  -1       -1         1
>        g5.c                   1       -1        -1
>
> ## Fitting the linear model by empirical bayes statistics for differential expression:
>      fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design), cmtx))
>      topTable(fit, adjust.method="BH")
>
> ## The below topTable proteins are the same as the subset of data from above:
>      > topTable(fit, adjust.method="BH")
>             GenotypevsTreatment Genotype    Treatment    AveExpr        F      P.Value    adj.P.Val
>      prot1        -0.40786338 60.30918  0.073054723  -9.918822 17308.55 1.124646e-39 1.232079e-36
>      prot2        -0.09255219 59.60864  0.061701713  -9.897968 15801.43 3.304533e-39 1.232079e-36
>      prot3        -0.23880357 73.48557  0.536672827 -12.090016 15650.65 3.701463e-39 1.232079e-36
>      prot4        -0.11834000 66.76931  0.305471823 -11.122034 15522.46 4.079731e-39 1.232079e-36
>      prot5        -0.15210172 59.21509 -0.183849274  -9.876144 14734.51 7.556112e-39 1.423908e-36
>      prot6        -0.15761118 62.87467  0.155340561 -10.389362 14565.87 8.658504e-39 1.423908e-36
>      prot7        -0.03886438 61.15652 -0.166795475 -10.182834 14551.88 8.757515e-39 1.423908e-36
>      prot8        -0.10425341 64.63523 -0.186904167 -10.780359 14461.18 9.429854e-39 1.423908e-36
>      prot9        -0.03426380 53.48057  0.007403722  -8.854471 13713.49 1.767090e-38 2.021378e-36
>      prot10       -0.75250251 66.62646  0.327497120 -10.894506 13480.51 2.164184e-38 2.021378e-36
>
> I think I am naively constructing the contrast matrix. The result for Genotype (above) looks incorrect to me. Ideally, I would like to figure this out because I wish to apply similar modeling techniques to an RNAseq dataset from the same experimental samples that I used for my protein work.
>
> Also, in order to make the correct comparisons, do I also need to employ the ‘duplicateCorrelation’ function (pg. 50, lima guide 9.7 Multi-level Experiments)? Something like this:
>
> corfit <- duplicateCorrelation(proteins.mat, design)
>> corfit$consensus
> [1] -0.7888584
> fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design, corrleation=corfit$consensus), cmtx))
>
> Any insight/corrections would be greatly appreciated, and if I have forgotten some information that would aid in helping me with this problem I am happy to cooperate.
>
> Cheers,
> Ryan
>
> Ryan M. Ghan
> Graduate Research Assistant
> Department of Biochemistry and Molecular Biology, MS 330
> University of Nevada, Reno
> Reno, NV 89557
> Howard 205
> Lab: (775) 784-4225
> rghan at unr.edu
>
>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor