[BioC] How to pool subgroups for makeContrasts() and subsequent limma analysis?
James W. MacDonald
jmacdon at uw.edu
Wed Feb 6 18:14:56 CET 2013
Hi Rene,
On 2/6/2013 11:29 AM, René wrote:
>>> Hi René,
>>>
>>>
>>> You are almost there. Note that you want the mean of the three groups,
>>> not the sum. So
>>>
>>> makeContrasts((B1 + B2 + B3)/3 - A)
>>>
>>> will e.g., do the comparison of B vs A.
>>>
>>> Best,
>>>
>>> Jim
> Dear James,
>
> I performed the pooled analysis as you suggested and compared the results to a
> pure B - A comparison (no subgroups specified). Interestingly, both analyses
> give different results (497 vs 15 genes with log2FC>= 1 and p< 0.05).
> Could you explain this huge difference?
If I assume that by a pure B-A comparison you redefined your design
matrix so you only have three columns (A,B,C), and then did the B-A
comparison, then it is simple to explain. I would also guess that the
C-A comparison gives different results as well, depending on how you
define your design matrix.
Note that the contrast calculates the difference between the means of
the two groups in the numerator and a measure of intra-group variability
in the denominator. So in heuristic terms, the numerator says how
different the groups are, and the denominator tells you if that
difference is 'large' or not, by comparing to the within group
variability. So if the groups are really 'tight' then a small difference
in means might result in a significant test, but if the groups are
really variable then the mean differences have to be pretty big as well
to achieve significance.
How you define your groups has no bearing on the numerator, because the
difference of B-A is the same if you do B-A or if you do (B1+B2+B3)/3-A.
However, the denominator may well be quite different, depending on the
B1, B2, and B3 groups.
In the instance where you did (B1+B2+B3)/3-A, the intra-group
variability for the denominator is based in the variability within the
A, B1, B2, B3, and C groups. So if all the B-type groups are pretty
tight, then you will likely get more differentially expressed genes.
If you do the 'pure' B-A comparison, then the denominator is based on
the intra-group variability of the A,B,C groups. If the B1, B2, B3
groups are pretty tight, but not really similar, then the combined B
group will be highly variable, so your denominator will tend to be
larger, resulting in fewer differentially expressed genes. Since the
denominator is the same for all contrasts, I would imagine the C-A
comparison has fewer genes as well.
Does that help?
Best,
Jim
>
> Best regards,
> René
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list