[BioC] Differential Gene Expression using limma
James W. MacDonald
jmacdon at med.umich.edu
Fri Jan 27 20:01:57 CET 2006
Hi Sonia,
Sonia SHAH wrote:
> Hi,
>
> It would be greatly appreciated if I could get some advice on how to go
> about looking at differential expression on my data.
>
> I have Affy data from 3 different cell types: type1, type2, type3
> and 3 biological reps for each type
>
>
> I want to get 2 gene lists using limma:
> 1. genes that are expressed in type1 and type3 but not in type2
> 2. genes that are expressed in type2 and type3 but not in type1
Just a technical point here; you cannot find genes that are 'expressed'
in one sample and not in another. The best you can do is find genes that
are expressed at a different level between samples.
>
> There seem to be lost of different ways of doing this. I tried 2 design
> matrices:
>
> DESIGN1
> type1 type2 type3
> type1rep1 1 0 0
> type1rep2 1 0 0
> type1rep3 1 0 0
> type2rep1 0 1 0
> type2rep2 0 1 0
> type2rep3 0 1 0
> type3rep1 0 0 1
> type3rep2 0 0 1
> type3rep3 0 0 1
>
> contrasts: (type1+type3)-type2
> (type2+type3)-type1
These are not contrasts. To be a contrast, the coefficients have to sum
to zero, so you would need
(type1 + type3)/2 - type2
(type2 + type3)/2 - type1
>
>
>
> DESIGN2
> I would use 2 design matrices to get each gene list
>
> The first matrix below will give genes that are in type1+3 but not in
> type2:
>
> A B
> type1rep1 1 0
> type1rep2 1 0
> type1rep3 1 0
> type2rep1 0 1
> type2rep2 0 1
> type2rep3 0 1
> type3rep1 1 0
> type3rep2 1 0
> type3rep3 1 0
>
> contrast A-B
>
>
> The second matrix below will give genes that are in type2+3 but not in
> type1:
>
> A B
> type1rep1 0 1
> type1rep2 0 1
> type1rep3 0 1
> type2rep1 1 0
> type2rep2 1 0
> type2rep3 1 0
> type3rep1 1 0
> type3rep2 1 0
> type3rep3 1 0
>
> contrast A-B
>
>
> I would have thought that the two different approaches would give me the
> same number of differentially expressed genes. But it doesn't. It gives
> me very different numbers.
>
> Are the two approaches the same or am I doing something completely
> wrong?
Well, if you used the contrasts as I outline above they will be very
similar but still not the same. The difference is a technical point
about how the contrasts are computed. Note: To make this explanation
easier to understand, I am omitting the empirical Bayes moderation step.
In the first case, the contrast you are using is very similar to a
t-statistic, in which you are computing the difference in mean
expression in the numerator, and an estimate of how accurately you are
computing those means in the denominator. Since you have three groups,
the denominator tells you how well you are estimating the mean of those
three groups (based on the variance within each group - this is the
important point).
In the second case, the contrast is identical to a t-statistic because
you have two groups you are comparing and the denominator estimates how
well you are estimating the means of those two groups.
To illustrate this difference, here is an example.
Let's say that the expression values for a particular gene look like this:
Type1 = 5.6, 5.8, 5.4
Type2 = 8.5, 8.6, 8.3
Type3 = 14.1, 14.2, 14.5
Now in the first case, if you compute the contrast
(type2 + type3)/2 - type1
you will get a difference of ~5.8 and a very significant p-value because
the variability *within* each sample type is very small.
On the other hand, if you did the comparisons as in your second case,
this would probably not be significant because the variability within
the pooled Type2 and Type3 samples would now be quite high. This will
result in a much larger denominator for your t-statistic (but with the
same numerator), so the resulting p-value will be much larger.
So how you do things depends on what exactly you are looking to show. If
you want to find those genes where e.g., Type1 is different from the
mean expression of Type2 and Type3 then you want to use your first
method. If you want to find those genes where the expression values for
Type1 are different from Type2 and Type3 _and_ there is very little
difference between Type2 and Type3, then you should use your second method.
HTH,
Jim
--
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list