[BioC] Confusion over limma documentation and design/contrast matrix

Mon Nov 15 12:25:27 CET 2004

Perfect :-)
Thank you.

-----Original Message-----
From: Gordon Smyth [mailto:smyth at wehi.edu.au] 
Sent: 14 November 2004 02:36
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Confusion over limma documentation and design/contrast
matrix

>Date: Fri, 12 Nov 2004 13:42:22 -0000
>From: "michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk>
>Subject: [BioC] Confusion over limma documentation and design/contrast
>To: "Gordon Smyth" <smyth at wehi.edu.au>
>Cc: bioconductor at stat.math.ethz.ch
>
>Hi Gordon
>
>Thanks for the response.  I worked through my two examples and found 
>out they are equivalent.
>
> >Do you know how to multiply a vector by a matrix? If you do, then I
>think
> >the best way to figure out what the design matrix is doing for you is
>just
> >to sit down with a piece of paper and a pencil for a few minutes, and

> >multiply the design matrix by the coefficient vector.
>
>I did actually do this previous to sending the mail but it didn't easy 
>my confusion.

No you haven't actually, see below. The design matrix multiples the
vector 
of regression coefficients. Whenever the word "coefficients" is used in
the 
limma documentation, it refers to the regression coefficients. You can
read 
about this in any text book which covers multiple regression.
Unfortunately 
there are not lot a lot of books which include a matrix treatment of 
multiple regression or linear models.

> > apoai.design %*% apoai.contrasts

Multiplying the design matrix by a contrast matrix is nonsense. No 
suggestion has been made that you should do that.

 > I have read the updated limma documentation and it is clearer now,
but
>it is still unclear why the above design matrix is used for the ApoAI 
>data - there is no explanation as to how it relates to the design 
>matrix and contrasts matrix I calculated above using modelMatrix() and 
>makeContrasts().

It seems to me that this is exactly what is explained in Section 10.5 of

the User's Guide. For example, the paragraph:

"Here the first coefficient estimates the difference between wild type
and 
the reference for each
probe while the second coefficient estimates the difference between
mutant 
and wild type.
For those not familiar with model matrices in linear regression, it can
be 
understood in the
following way. The matrix indicates which coefficients apply to each
array. 
For the first
two arrays the fitted values will be just the WTvsREF coefficient, which
is 
correct. For the
remaining arrays the fitted values will be WTvsREF + MUvsWT, which is 
equivalent to mutant vs
reference, also correct."

explains in words why the second coefficient defined why the ApoAI
design 
matrix measures the comparison of interest: mutant vs wildtype
comparison. 
Perhaps this still isn't clear, but it isn't right to say that there is
no 
explanation.

Here is one more go at trying to explain:

You have two groups, wildtype and mutant. Two arrays for the first group

and three for the second. Suppose the log-fold-change for the first
group 
is A and for the second is B.  What we want is B-A.

The first design matrix

1 0
1 0
1 1
1 1
1 1

represents the five arrays by

c1
c1
c1+c2
c1+c2
c1+c2

where c1 and c2 are the regression coefficients. In other words, we have

c1=A and c1+c2=B.  This means that c2 must represent c2=B-A.

The second design matrix

1 0
1 0
0 1
0 1
0 1

represents the five arrays by

c1
c1
c2
c2
c2

where again c1 and c2 are the regression coefficients. In this case we
have 
c1=A and c2=B.  So to get B-A we have to go one extra step, to compute
the 
difference between c2 and c1 to get B-A and its standard error.  This 
difference is represented by the contrast (-1,1) because matrix 
multiplication of (-1,1) by (c1 c2) gives c2-c1 = B-A.

In summary, the first design matrix represents A and B by the
coefficients 
c1=A and c2=B-A.  The second design matrix represents A and B by the 
coefficients c1=A and c2=B. The first representation includes our
contrast 
of interest, the second does not. Hence we need to take the difference 
between the coefficients in the second case but not in the first.  You
can, 
of course, do it either way.

Finally, as explained in Section 10 of the limma User's Guide, this is
no 
different from what happens (implicitly) when you use the lm() or aov() 
functions in R.

Gordon

 > Regards
 >
 > Michael Watson

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor