[BioC] questions of using Limma: should I include all the
samples?
Fangxin Hong
fhong at salk.edu
Tue Feb 8 01:57:49 CET 2005
> I am trying to use Limma with design matrix of
>
> 1 0 0 0
> 1 0 0 0
> 1 0 0 0
> 0 1 0 0
> 0 1 0 0
> 0 1 0 0
> 0 0 1 0
> 0 0 1 0
> 0 0 1 0
> 0 0 0 1
> 0 0 0 1
> 0 0 0 1
>
> to estimate the four coefficinet of C, C+ A, C+B and C+A+B+AB (of
course,
> I
> can estimate A, B, and AB directly using a different design matrix).
>
> Since the contrast of interest is A and AB, so the contrast matrix
should
> be:
> -1 1 0 0
> -1 -1 -1 1
>
> My question is:
> 1) Are the design and contrast matrix correct?
If your design matrix is right, then your contrast marix is not right, as
the (-1,-1,-1,1) will give you estimate of AB-2C, but not AB.
I would suggest you estimate C, A, B, and AB
using design matrix
1 0 0 0 (only C)
1 1 0 0(C+A)
1 0 1 0(C+B)
1 1 1 1 (C+A+B+AB)
and construct your contrast as
0 1 0 0 (test A)
0 0 0 1 (test AB)
> 2) I know this is a very naive question, but if I am only interested in
hormone only effect, can I just use the untreated and hormone alone treated
> samples as the input (so instead of the 12 CEL files, only use the first
6
> CEL files)? Will the analysis result be the same or different if not
counting the normalization-produced difference? If there is difference, is
> that due to the difference of df?
Well, this will only affect your error variance estimation, since you lose
power for it. Usually less genes will be identified out using subset of
the data, is indeed you can assume one model for all 12 data sets.
Hopefull this would help.
Fangxin
--
Fangxin Hong, Ph.D.
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong at salk.edu
More information about the Bioconductor
mailing list