[BioC] Limma: correct calculation of B statistics (log odds)
Gordon Smyth
smyth at wehi.EDU.AU
Thu Apr 27 06:18:03 CEST 2006
Hi Ben,
Well, all models are approximate, the real question is, how sensitive
are they to deviations from model assumptions which are likely to
occur in practice. The two-sample t-test holds up remarkably well
under moderate deviations from normality, equal variances etc. On the
other hand, the variance test you have used is known to be
exceptionally sensitive to its assumptions. You might be amused by
George Box's comment, on the the subject of testing equality of
variances before doing a t-test:
"To make the preliminary test on variances is rather like putting
to sea in a rowing boat to find out whether conditions are
sufficiently calm for an ocean liner to leave port."
See http://www.garfield.library.upenn.edu/classics1982/A1982MX29400001.pdf
In the microarray context, you could well explore inequality of
variances if you have large samples in both groups, but I would
suggest you do this more robustly and descriptively than var.test().
If you have small sample sizes, my experience is that there are
usually many other more important factors to worry about than this.
Best wishes
Gordon
At 07:57 AM 27/04/2006, Wittner, Ben, Ph.D. wrote:
>Dear Gordon,
>
>I apologize for not thanking you more quickly for your detailed and thoughtful
>response. I think I agree with everything you've said below, but now I have
>another concern on which I would like your opinion.
>
>For many of the data sets I've dealt with, for many genes, the
>variances of the
>two classes do not seem to be equal. For example, the code below uses R's
>var.test() to produce a p-value for each gene and then plots a
>histogram of the
>p-values. The histogram can be viewed at
>http://tinyurl.com/epdn7
>
>The model implemented in limma seems to assume a single variance for
>each gene.
>Do you think this is a problem?
>
>Thanks again,
>-Ben
>
>library('ALL')
>data(ALL)
>pdat <- pData(ALL)
>subset <- intersect(grep('^B', as.character(pdat$BT)),
> which(pdat$mol %in% c('BCR/ABL', 'NEG')))
>eset <- ALL[, subset]
>i1 <- which(eset$mol == 'BCR/ABL')
>i2 <- which(eset$mol == 'NEG')
>pvals <- apply(exprs(eset), 1, function(v) (var.test(v[i1], v[i2])$p.value))
>
>jpeg(filename='ALL.jpeg', width=240, height=240)
>hist(pvals, col='green',
> main='Histogram of var.test() pvals for ALL BCR/ABL vs NEG')
>dev.off()
More information about the Bioconductor
mailing list