[BioC] R/maanova

Gary Churchill gary.churchill at jax.org
Wed Dec 14 20:03:47 CET 2005

Regarding the thread on limma versus maanova.  Clearly I can only offer a biased point of view, but I would strongly encourage people to try the methods that are implemented in both software packages. A major motivator for us to contribute R/maanova to the Bioconductor project was to facilitate this kind of comparison. 

We made an effort to create a version of R/maanova that integrates easily with other Bioconductor packages.  Feedback on how well this does or does not work is most welcome.  It can always be improved.

I'd like to clarify the current state of R/maanova.  We've come a long way since Kerr, Martin and Churchill (2000, JCompBiol).

The core of R/maanova is a mixed model ANOVA program.  We benchmarked and validated it against SAS/Proc Mixed and it compares well for speed and accuracy.  The syntax is deceptively simple.  R/maanova is capable of fitting essentially any mixed model to any experimental design.  R/maanova is specialized for microarray data.  This restriction provides some efficiency in the implementation. 

R/maanova includes an EB test statistic.  Indeed we recommend and exclusively use the "Fs" test statistic in applications.  This is an empirical Bayes test based on shrinkage estimates of variance components. We formulated it as F-like statistic rather than an odds ratio, but it is still an "information borrowing" statistic. There is now ample evidence in the literature that EB statistics are the way to go. In our simulation studies both B (limma) and Fs gave comparable performance.  Both of these statistics outperform, the simple gene-by-gene F test (F1 in R/maanova) and the pooled variance method (F3 in R/maavova) and most other methods to which we made comparisons (e.g., SAM).  Some of these results can be found in Cui et al 2005 Biostatistics. The bottom line here is that B and Fs are not fundamentally different in settings where both apply. However Fs is more general in allowing for multiple variance components and in the ability to test a wider range of hypotheses. 

I am puzzled by the reference to gene-by-gene analysis as a feature of R/maanova.   In R/maanova you can restrict information to only the gene tested (F1), you can pool variance estimates across all genes (F3) or you can use an EB approach (Fs). The latter is recommended.

As noted in the thread, the current implementation of limma does not allow for multiple sources of variance.  In some cases, this is not an issue.  Treating array effects as fixed or as random does not have much impact (except in the speed).  However if your design is nested with multiple assays of the same biological sample, failure to treat the biological replicates as a random effect can be seriously misleading.  The tests will be liberal, often very liberal.  One work-around is to average all of the data from a single bio rep before doing the analysis.  R/maanova will allow you to fit the full mixed as well as the (inappropriate) fixed effect model - so you can do the wrong analysis and compare the two sets of results.

R/maanova offers many options for permutation testing.  Hao Wu worked out a scheme to ensure that permutations are done properly in the presence of stratification variables (covariates).  However we are concerned by the recent paper of Xie et al (Bioinformatics 2005).  R/maanova will pool p-values across genes by default (not pooling is an option).  Although pooling was recommended by Storey et al (in the Parmagianni volume) I am now feeling uneasy about it. To pool or not pool p-values?  It is an open question.  We are noticing that the FDR corrections can be very fragile.  This is also worth a closer look, but it is a methodology question, not a software issue. 

Missing/flagged data can now be handled in R/maanova using a single imputation method. Not ideal but better than throwing out data.  

It is a good idea to check your data for dye-intensity effects and for spatial effects. R/maanova includes multiple data transformation options.  We use the intensity loess routinely and spatial loess only in dire situations. I would be interested to hear from users who have used other bioconductor tools for pre-processing in conjunction with R/maanova.  It was our intent to make this possible and easy. 

R/maanova works with Affy data.  This seems to be a widespread misunderstanding. We routinely do the pre-processing of Affy data using other bioconductor functions.  The output from rma, for example, can be piped right into R/maanova.  R/maanova will also analyze three (four, five, ...) color arrays if you happen to be using that technology. (We can provide example scripts for Affy analysis on request.)

Regarding the analysis of intensity versus ratios:  Reduction of data to ratios destroys information.  If you have readings of 100, 200 for gene1 and 10000, 20000 for gene2, the ratio is 2 in both cases. You cannot recover the lost information. Perhaps it is not a big deal in some settings because there is little interest in the 'interblock information'.  Treating array as fixed will have much the same effect as taking ratios when ratios make sense.  But there are many designs for which there is no obvious reduction of the data to ratios.  For example, a three way comparison without a reference sample.  (You could use R/maanova to analyze ratio data, it may require some data manipulation in native R to get it set up.  I am not recommending it.) 

Both R/maanova and R/limma are powerful analysis tools.  Flexibility is a strong point for both but perhaps more so for R/maanova.  As with any good tool, these can be misapplied.  The discussion that prompted my comment indicates that people are thinking about these issues.  I am not as well informed about R/limma as I should be, so I apologize if I have made any misrepresentations. I hope that this contribution is helpful and that people will continue to use both tools!

Gary A. Churchill
Senior Staff Scientist
The Jackson Laboratory
600 Main StreetBar Harbor, ME 04609 
ph: (207) 288-6189
fax: (207) 288 6077
email: gary.churchill at jax.org
Now he needed a great variety of models whoseelements could be combined in order to arriveat the one that would best fit reality.
-Italo Calvino.

More information about the Bioconductor mailing list