[BioC] Modelling interaction using limma package

Tue Sep 21 18:38:53 CEST 2004

Dear Ramon;
   Thank you very much for the detailed answer of my question, it works
for me!!!

One more question, for one factor, if different individual (gene) has
different number of levels (thus different design and constrast matrix).
Theoretically, this is still doable using the proposed linear model with
Emprical Bayes method according to the paper, but in practice, do you or
anybdy on this list know how to do it?

what I am thinking is to divide the whole data set into several groups,
all individuals in one group has same number of levels, then apply limma
package (lmFit and eBayes) as usual. Although this changes the original
assumption that variance from all individuals are from the same
distribution with the one that variance from individuals in each group are
from the same distribution, it should still be valid analysis as each
group will still have large amount of genes. Does anyone have better idea
beside writing my own code of doing eBayes.

Thanks.

Fangxin

> Dear Fangxin,
>
> On Tuesday 14 September 2004 19:41, Fangxin Hong wrote:
>> Does anyone know how to model interaction (two-way ANOVA) using limma
>> package (lmFit)?
>
> You need to set up the design matrix and decide what contrasts you are
> interested in. At the bottom there is a somewhat long example (you'll have
> to
> supply your own data; this is from a script I have); I have added a few
> checks to make sure I was testing what I wanted to test. Please note that,
> for some reason I don't recall now, I made some minor modification to
> classifyTestsF because it was more convenient for my purposes.
> [Disclaimer:
> my example is probably overly complicated].
>
>
>> Also, it seems that eBayes can be applied to other model fit object,
>> like
>> 'lm.series', 'gls.series' and  'rlm.series' beside 'lmFit'. Does anyone
>> use this before?
>
>
> Yes. I've used it with lm and rlm, and maybe also with gls. No major
> differences. But of course, you might want to make sure that using gls or
> rlm
> (or lm) makes sense in your case; for instance, the MASS book (by Venables
> &
> Ripley) I think contains some cautionary points about using rlm and
> related
> methods with too small sample sizes.
>
>
> Hope this helps.
>
>
> ******************************
>
> The example; I can email you privately this as an attachement.
>
> ### treat is a factor, and age:centered is continuous.
>
>
> options(contrasts = c("contr.sum", "contr.poly"))
>
> d.trt.BY.age <- model.matrix(~ -1 + treat + treat:age.centered)
> colnames(d.trt.BY.age) <- c("Colon", "Mama", "Normal",
>                          "Colon.by.age", "Mama.by.age", "Normal.by.age")
> contrasts.d.trt.BY.age <-
>     makeContrasts(Colon - Normal,
>                   Mama - Normal,
>                   Mama - Colon,
>                   0.5*Colon + 0.5* Mama - Normal, ## testing "cancer" vs.
> normal
>                   Colon.by.age - Normal.by.age,
>                   Mama.by.age - Normal.by.age,
>                   Mama.by.age - Colon.by.age,
>                   levels = d.trt.BY.age)
>
>
> lm.trt.BY.age <- lm.series(d.clean, d.trt.BY.age)
> lm.trt.BY.age.eb <- ebayes(lm.trt.BY.age)
> lm.trt.BY.age.contrasts <- contrasts.fit(lm.trt.BY.age,
>                                          contrasts.d.trt.BY.age)
> lm.trt.BY.age.contrasts.eb <- ebayes(lm.trt.BY.age.contrasts)
>
>
>
>
> d.trt.BY.age2 <- model.matrix(~ -1 + treat + age.centered +
>                               treat*age.centered)
> colnames(d.trt.BY.age2) <- c("Colon", "Mama", "Normal",
>                              "age", "Colon.by.age", "Mama.by.age")
> contrasts.d.trt.BY.age2 <- makeContrasts(Colon.by.age,
>                                         Mama.by.age,
>                                         levels = d.trt.BY.age2)
>
>
>
>
> #### Get the interaction F-statistic
> tmp <-
>     classifyTestsF(lm.trt.BY.age2.contrasts.eb$t,
>                    contrasts = contrasts.d.trt.BY.age2,
>                    design = d.trt.BY.age2,
>                    df = lm.trt.BY.age2$df +
> lm.trt.BY.age2.contrasts.eb$df,
>                    fstat.only = TRUE)
> lm.interaction.Fstat <- cbind(F.stat =tmp$Fstatistic,
>                               df1 = rep(2, length(tmp$Fstatistic)),
>                               df2 = tmp$df2)
> lm.interaction.Fstat <- cbind(lm.interaction.Fstat,
>                               p.value = pf(lm.interaction.Fstat[,1],
>                               lm.interaction.Fstat[,2],
>                               lm.interaction.Fstat[,3],
>                               lower.tail = FALSE))
> lm.interaction.Fstat <- cbind(lm.interaction.Fstat,
>                               fdr.p.value
> p.adjust(lm.interaction.Fstat[, 4],
>                                        method = "fdr"))
> summary(lm.interaction.Fstat)
>
>
>
> #############    Checks
>
> ### check appropriate contrasts testing for testing interaction via
> calssifyTestsF
> l0 <- lm(d.clean[2, ] ~ -1 + treat + age.centered)
> l1 <- lm(d.clean[2, ] ~ -1 + treat + age.centered + age.centered*treat)
> l2 <- lm(d.clean[2, ] ~ -1 + treat + treat:age.centered)
>
> anova(l0)
> anova(l1)
> anova(l2)
>
> anova(l0, l1) ## this and the following are the same
> anova(l0, l2)
> anova(l1, l2) ## nothing, as expected
>
> ## just for checking!!! Note I modify ts, etc.
> lm.trt.BY.age2.contrasts.eb$t[2,] <- summary(l1)$coefficients[5:6, 3]
>
> classifyTestsF(lm.trt.BY.age2.contrasts.eb$t[2, ],
>                contrasts = contrasts.d.trt.BY.age2[2, ],
>                design = d.trt.BY.age2,
>                df = lm.trt.BY.age2$df[2, ] +
>                lm.trt.BY.age2.contrasts.eb$df,
>                fstat.only = TRUE)
> ### OK, works as it should
>
>
>
>
> ### changes to classifyTestsF
>
> ## I change the return value when fstat.only = TRUE
> classifyTestsF <-
> function (object, cor.matrix = NULL, design = NULL,
>           contrasts = diag(ncol(design)),
>     df = Inf, p.value = 0.01, fstat.only = FALSE)
> {
>     if (is.list(object)) {
>         if (is.null(object$t))
>             stop("tstat cannot be extracted from object")
>         if (missing(design) && !is.null(object$design))
>             design <- object$design
>         if (missing(contrasts) && !is.null(object$contrasts))
>             contrasts <- object$contrasts
>         if (missing(df) && !is.null(object$df.prior) &&
>             !is.null(object$df.residual))
>             df <- object$df.prior + object$df.residual
>         tstat <- as.matrix(object$t)
>     }
>     else {
>         tstat <- as.matrix(object)
>     }
>     ngenes <- nrow(tstat)
>     ntests <- ncol(tstat)
>     if (ntests == 1) {
>         qT <- qt(p.value/2, df, lower.tail = FALSE)
>         return(sign(tstat) * (abs(tstat) > qT))
>     }
>     if (!is.null(cor.matrix) && !is.null(design))
>         stop("Cannot specify both cor.matrix and design")
>     if (!is.null(design)) {
>         design <- as.matrix(design)
>         R <- chol(crossprod(design))
>         cor.matrix <- crossprod(backsolve(R, contrasts, transpose = TRUE))
>         d <- sqrt(diag(cor.matrix))
>         cor.matrix <- cor.matrix/(d %*% t(d))
>     }
>     if (is.null(cor.matrix)) {
>         r <- ntests
>         Q <- diag(r)/sqrt(r)
>     }
>     else {
>         E <- eigen(cor.matrix, symmetric = TRUE)
>         r <- sum(E$values/E$values[1] > 1e-08)
>         Q <- matvec(E$vectors[, 1:r], 1/sqrt(E$values[1:r]))/sqrt(r)
>     }
>     if (fstat.only) {
>         fstat <- list()
>         fstat$Fstatistic <- drop((tstat %*% Q)^2 %*% array(1, c(r, 1)))
>         fstat$df1 <- r
>         fstat$df2 <- df
>         return(fstat)
>     }
>     qF <- qf(p.value, r, df, lower.tail = FALSE)
>     if (length(qF) == 1)
>         qF <- rep(qF, ngenes)
>     result <- matrix(0, ngenes, ntests, dimnames = dimnames(tstat))
>     if (is.null(colnames(tstat)) && !is.null(colnames(contrasts)))
>         colnames(result) <- colnames(contrasts)
>     for (i in 1:ngenes) {
>         x <- tstat[i, ]
>         if (any(is.na(x)))
>             result[i, ] <- NA
>         else if (crossprod(crossprod(Q, x)) > qF[i]) {
>             ord <- order(abs(x), decreasing = TRUE)
>             result[i, ord[1]] <- sign(x[ord[1]])
>             for (j in 2:ntests) {
>                 bigger <- ord[1:(j - 1)]
>                 x[bigger] <- sign(x[bigger]) * abs(x[ord[j]])
>                 if (crossprod(crossprod(Q, x)) > qF[i])
>                   result[i, ord[j]] <- sign(x[ord[j]])
>                 else break
>             }
>         }
>     }
>     new("TestResults", result)
> }
>
>
>
>
>
>
>
> --
> Ramón Díaz-Uriarte
> Bioinformatics Unit
> Centro Nacional de Investigaciones Oncológicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fernández Almagro, 3
> 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
>
> http://ligarto.org/rdiaz
> PGP KeyID: 0xE89B3462
> (http://ligarto.org/rdiaz/0xE89B3462.asc)
>
>
>

-- 
Fangxin Hong, Ph.D.
Bioinformatics Specialist
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong at salk.edu