[R-sig-ME] factor-specific variances in lmer
Paul Johnson
Paul.Johnson at glasgow.ac.uk
Tue Dec 13 00:46:01 CET 2011
Dear Misha,
Why not just fit both models (heteroscedastic and homoscedastic) and compare them? You can do this with lme, but not lmer afaik.
E.g.
update(homoscedastic.model,weights=varIdent(form=~1|gene))
will refit the homoscedastic model with a separate variance for each level of the factor "gene", followed by model comparison via
anova(heteroscedastic.model,homoscedastic.model).
A drawback is that it's trickier and slower to fit crossed random effects in lme, which I guess you have - genes crossed with samples? Pinheiro & Bates (Mixed Effects Models in S and S-Plus, 2000, p163-167) show how to do this using pdBlocked.
Best wishes,
Paul
-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of mikhail matz
Sent: 12 December 2011 23:07
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] factor-specific variances in lmer
Dear colleagues,
A reviewer of my manuscript keeps bringing up this issue, and I ran out ideas and expertise trying to address his concern.
I am looking at application of lmer to analyze gene expression. Typical gene expression problem involves many samples across different conditions, each sample containing measurements for the same collection of genes. Mixed modeling is primarily needed because of the random sample effects (different amount of biological material in each sample, the effect therefore is the same for all genes in the sample). The issue is that both the gene-specific effects of conditions and the residuals would tend to have their own variances. The reviewer requires that the model should explicitly include such gene-specific variances as parameters. He cites a paper that uses SAS to formulate such models. Here is what he says exactly:
"Residual and/or sample*gene effects
should have gene-specific variances. This has been observed for a number of dataset (Steibel et al., 2009) and can be proved to be the case for the dataset used in this model. Ignoring heterogeneous variances can potentially lead to wrong inferences. Authors should made readers aware of this and demonstrate why in this particular dataset homogeneous variance could be safely assumed. If that can't be proved, then the methods presented here should be modified to account for this."
Now, lmer does not even include such an option, and honestly I never before came across such a concern in linear modeling. I wonder why, though. As far as I could gather from Doug's lmer algorithm paper with my rudimentary math training was that this issue is taken care of by the fitting algorithm. Here is what I wrote in rebuttal:
"It is true that a straightforward linear fitting based on least-squares would be affected by unequal variances depending on the values of factors (in our case, genes): the most highly variable gene will have the most leverage in determining the resulting parameter estimates. It is also true that, for a general case, it is unrealistic to assume that the variances of expression of all genes would be equal. Fortunately, there are efficient algorithmic solutions to this problem.
In particular, the lmer function used here relies on "penalized, iteratively reweighted, least squares" (PIRLS) algorithm, which is specifically designed to counter the artifacts induced by this type of heteroscedasticity. In other words, lmer assumes by default that the variances are factor-specific, and does not require specification and estimation of their exact parameters to deal with it."
As you might guess from the fact that I am still bugging you with all this, the reviewer was not satisfied by this reply:
"I insist that the use of a homoskedastic model under these circumstances (heterogeneous
variances) is incorrect. The author says that PIRLS algorithm is robust to heteroskedasticity and it cites a technical document from Dr. Douglas Bates (http://cran.rproject.
org/web/packages/lme4/vignettes/Theory.pdf). The problem is that the cited document does not support the claim made by the author. Maybe the confusion comed from section 2.5 of the cited document that says: "(It may look as if we have missed the dependence on sigma on the left-hand side but it turns out that the scale parameter does not affect the location of the optimal values of quantities in the linear predictor.)". The fact that "the scale parameter does not directly affect the location of the optimal values of quantities in the linear predictor" does not mean the "algorithm is robust to heteroskedasticity". Actually, if you look at equation 36-39, section 4.2 of the technical document, it is clear that the updating equations do not depend on the absolute value of the variances, but on variance ratios. If the variances change freely across genes, so will their ratios. This is a very well known fact. For example, borrowing from the microarray literature, when multiple genes are analyzed, heterogeneous variances should be explicitly modeled [2]. In summary, a fitting algorithm can not make up for a wrongly specified model in general, and this is particularly true for the linear mixed model and Fisher scoring (PIRLS) algorithms. Consequently, the answer provided and the corresponding statement in the paper is originated in a clear misinterpretation of the companion documentation to lmer. In conclusion, the statement in lines 112-115 is incorrect."
Now, this already goes way beyond my level of expertise. I would tremendously appreciate your comment and advice as to how I am to deal with this.
best regards,
Misha
Mikhail V. Matz
Assistant Professor
University of Texas at Austin
Integrative Biology Section
1 University station C0930
Austin, TX 78712
phone 512-992-8086 cell, 512-475-6424 lab fax 512-471-3878 web http://www.bio.utexas.edu/research/matz_lab
_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Glasgow, charity number SC004401
More information about the R-sig-mixed-models
mailing list