[BioC] Limma: correct calculation of B statistics (log odds)

Gordon Smyth smyth at wehi.EDU.AU
Thu Apr 27 06:18:03 CEST 2006


Hi Ben,

Well, all models are approximate, the real question is, how sensitive 
are they to deviations from model assumptions which are likely to 
occur in practice. The two-sample t-test holds up remarkably well 
under moderate deviations from normality, equal variances etc. On the 
other hand, the variance test you have used is known to be 
exceptionally sensitive to its assumptions. You might be amused by 
George Box's comment, on the the subject of testing equality of 
variances before doing a t-test:

    "To make the preliminary test on variances is rather like putting 
to sea in a rowing boat to find out whether conditions are 
sufficiently calm for an ocean liner to leave port."

    See http://www.garfield.library.upenn.edu/classics1982/A1982MX29400001.pdf

In the microarray context, you could well explore inequality of 
variances if you have large samples in both groups, but I would 
suggest you do this more robustly and descriptively than var.test(). 
If you have small sample sizes, my experience is that there are 
usually many other more important factors to worry about than this.

Best wishes
Gordon

At 07:57 AM 27/04/2006, Wittner, Ben, Ph.D. wrote:
>Dear Gordon,
>
>I apologize for not thanking you more quickly for your detailed and thoughtful
>response. I think I agree with everything you've said below, but now I have
>another concern on which I would like your opinion.
>
>For many of the data sets I've dealt with, for many genes, the 
>variances of the
>two classes do not seem to be equal. For example, the code below uses R's
>var.test() to produce a p-value for each gene and then plots a 
>histogram of the
>p-values. The histogram can be viewed at
>http://tinyurl.com/epdn7
>
>The model implemented in limma seems to assume a single variance for 
>each gene.
>Do you think this is a problem?
>
>Thanks again,
>-Ben
>
>library('ALL')
>data(ALL)
>pdat <- pData(ALL)
>subset <- intersect(grep('^B', as.character(pdat$BT)),
>                     which(pdat$mol %in% c('BCR/ABL', 'NEG')))
>eset <- ALL[, subset]
>i1 <- which(eset$mol == 'BCR/ABL')
>i2 <- which(eset$mol == 'NEG')
>pvals <- apply(exprs(eset), 1, function(v) (var.test(v[i1], v[i2])$p.value))
>
>jpeg(filename='ALL.jpeg', width=240, height=240)
>hist(pvals, col='green',
>      main='Histogram of var.test() pvals for ALL BCR/ABL vs NEG')
>dev.off()



More information about the Bioconductor mailing list