[BioC] Analyzing mulitple tissues
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Jun 7 13:37:22 CEST 2005
> Date: Tue, 7 Jun 2005 08:50:46 +0100
> From: David Kipling <KiplingD at cardiff.ac.uk>
> Subject: Re: [BioC] Analyzing mulitple tissues
> To: Naomi Altman <naomi at stat.psu.edu>
> Cc: bioconductor at stat.math.ethz.ch
> Hi Naomi,
> I am glad you raised this issue of lack of replication. I fully take
> on board your point. Something is niggling me, however, so hopefully
> someone with a better stats background than me can help.
> What puzzles me is that limma will allow you to run experiments with no
limma does not compute t-statistics for p-values when there is no replication. In the absence of
replication, limma simply computes fold changes. What is puzzling about that?
> *and* will return t-statistics for something like a 3x1
> comparison. I don't generally run experiments without replication (he
> says quickly, defending himself!) but I've just tried a mock experiment
> that is made up of 3 states with 4, 3, and 1 degrees of replication
> (see snippet below). Ordering by absolute(M) gives a ranking that is
> related to, but clearly distinct, from ranking by absolute(t statistic)
> or p-value.
> Out of curiosity, what is limma doing here and how should one interpret
> these t stats/p-values (if indeed one should!)? Are they any use over
> simple M values?
Yes, they are almost always better than simply using fold changes. Using M-values alone would
make no use of replication while the t-statistics make use of whatever replication is available.
Put very simply, some replication is better than none.
You seem to be concerned in your mock experiment that one of the states has no replication. The
limma analysis estimates the variance for each gene from the replicates available for states 1 and
2 and applies that estimate to state 3 as well. This analysis is perfectly valid provided that
the variability of the expression values is similar in state 3 to that in states 1 and 2.
Even when the variability is different in state 3, the limma analysis still gives a better ranking
than fold change, even for comparisons involving state 3, in most cases. The basic assumption is
that, across genes, the variance in state 3 is positively associated with the variance in states 1
and 2. This is a very weak assumption which is almost always true in practice, as genewise
differences in variability tend to dominate state-wise differences.
> # Based on an example in the limmaUsersGuide
> data1 <- ReadAffy()
> eset <- rma(data1)
> # Note the single chip in group 2
> design <- model.matrix(~ -1+factor(c(1,1,1,1,3,3,3,2)))
> colnames(design) <- c("group1", "group2", "group3")
> fit <- lmFit(eset, design)
> # Contrast 2 is a 1 v 3-chip comparison
> contrast.matrix <- makeContrasts(group2-group1, group3-group2,
> group3-group1, levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
> topTable(fit2, coef=2, adjust="none")
> Prof David Kipling
> Department of Pathology
> School of Medicine
> Cardiff University
> Heath Park
> Cardiff CF14 4XN
> Tel: 029 2074 4847
> Email: KiplingD at cardiff.ac.uk
> On 6 Jun 2005, at 15:22, Naomi Altman wrote:
More information about the Bioconductor