[BioC] Re: Designing matrix with limma

Mon Aug 25 20:23:51 MEST 2003

At 04:46 AM 23/08/2003, Sek Won Kong, M.D wrote:
>Dear Gordon
>
>I am sorry if I made any incovinience. I have a question about design matrix
>in limma.
>We've designed experiment and completed. But it's pretty tough to analysis.
>Design is 2 x 2 x 2 factorial design and one more factor is 2 different
>scanner settings were used randomly.  Actual experiment look like this.
>
>              |       Experiment A             |          Experiment B
>-------------------------------------------------------------
>              |    Control  |   Treatment    |    Control   |     Treatment
>-------------------------------------------------------------
>TIme A   |
>TIme B   |
>
>Each cell has biological  5 replicates of affy array. It's also possible to
>use just 2 x 2 factorial ANOVA and then compare results, but I think it's
>better to start with a single model in terms of parsimony and also two
>experiments are closely related in biological sense.

This raises a lot of issues of which probably the easiest is how to create 
a design matrix in limma. Let's consider the design matrix first.

You have 4 factors each with 2 levels, i.e., a 2^4 design, including the 
scanner settings. Do you know how to analyse ordinary factorial experiments 
with univariate data using R? If you do, then the extension to microarrays 
is straightforward in principle although the interpretation of the 
parameters may be difficult. You might analyse an ordinary experiment using 
in R using a call to 'lm' such as

    lm( y ~ (facA+facB+facC+facD)^4 )

where facA, facB, facC and facD are your factors. (I will assume for this 
email that you know how to create factors in R.) To use limma with 
microarray data, you can simply set

    design <- model.matrix( ~(facA+facB+facC+facD)^4 )
    fit <- lmFit( eset, design)

i.e., you can use function 'model.matrix' to extract the design matrix from 
the linear model formula. (I have assumed you have the development version 
of limma so that you can use lmFit.)

The difficulty is in interpreting the estimated coefficients from your 
model fit. How will you intepret three or four way interaction terms? 
Perhaps you would be better testing for a difference between the scanners 
and then analysing the other three factors separately. Perhaps it is the 
control vs treatment and time A vs time B comparisons which are really of 
interest to you, i.e., it is the 2x2 factorial with treatment and time 
which is really of interest to you. In that case you have a real chance of 
associating meaningful biological interpretations to the estimated 
coefficients. You need to think carefully about what questions you want to 
answer from your experiment and then tailor the analysis accordingly.

It would probably be a good idea to consult a statistician at Harvard and 
to help work out an analysis strategy.

Regards
Gordon

>Thank you for the helps in advance.
>
>Sek Won Kong with Bests.