[R-sig-ME] valid estimates using lme4?

Doran, Harold HDoran at air.org
Fri Oct 28 20:17:02 CEST 2011


It is impossible to determine if SAS, Stata, or SPSS are implementing the steps they claim to implement since the source code is not available. It is one thing to be able to write out the algebraic expression for solving mixed models, whether using Henderson's mixed model equations (SAS) or any other approach. 

Part of unit testing software does involve simulation and testing to ensure one recovers back the anticipated parameters. However, "validation" is NOT comparing output from one program to another.

Differences in various rule implementations (such as when to stop) can alter parameter estimates between programs. 

I would proposed validation requires the ability to review

1) The mathematical model proposed to implement the mixed model solution
2) A review of source code to ensure that code aligns with the mathematical model
3) Unit testing with some simulation

Since step (2) is impossible for SAS, Stata, and SPSS, how can they be validated? The source code and mathematical model are available for lme4 functions.

I assume the reviewer assumes they are valid because they are sold. In which case, I'm sure Doug Bates would be happy to collect a donation to bring him up to "validation" standards if that is what is required




> -----Original Message-----
> From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
> bounces at r-project.org] On Behalf Of Douglas Bates
> Sent: Friday, October 28, 2011 11:43 AM
> To: Vernooij, J.C.M. (Hans)
> Cc: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] valid estimates using lme4?
> 
> On Fri, Oct 28, 2011 at 9:04 AM, Vernooij, J.C.M. (Hans)
> <J.C.M.Vernooij at uu.nl> wrote:
> > Dear list members,
> >
> > For a concept article we used package lme4 for a logistic regression. A
> reviewer doubts about the validity of the outcomes:
> > "I strongly urge you to compare the outcomes of lme4 in R with a validated
> statistical package (SAS, STATA, SPSS) as lme4 is known not to be the best,
> especially when the Laplace approximation is being used as the default is only
> one (!) integration point". (quoted)
> 
> Rather strongly worded I would say.  I wouldn't suggest arguing with a
> referee but I wonder in what sense he/she believes that SAS, STATA and
> SPSS are "validated".  If the referee believes that the vendors
> guarantee correct answers he/she hasn't read the software licenses.
> 
> The issue here is the method of evaluating an approximation to the
> likelihood in a GLMM.  I have been involved in such discussions for
> over 20 years, although initially for nonlinear mixed-effects models
> rather than GLMMs.  When the reviewer is bemoaning the use of one
> integration point they are not taking into account the fact that the
> approximation is being evaluated at the conditional mode of the random
> effects.  That is every evaluation of the Laplace approximation (and
> the adaptive Gauss-Hermite quadrature evaluation, when it is done
> properly) itself involves an optimization, using penalized iteratively
> reweighted least squares, to determine the conditional mode of the
> random effects.  Once that is determined the conditional distribution
> of each random effect is approximated as a Gaussian distribution
> centered at the mode and with the standard deviation matching the
> second-order approximation.
> 
> SAS provides for the approximation to use additional evaluations at
> the Gauss-Hermite quadrature points.  Interestingly they cite a paper
> that Jose Pinheiro and I wrote as the reference on which they based
> this method.  This will certainly provide a better approximation but
> exactly how much better is not clear.  I don't know of papers in which
> the approximation of the integral itself was compared to see how much
> is gained.
> 
> It may seem that this issue could be put to rest by incorporating an
> adaptive Gauss-Hermite method in glmer and, as Dimitriris points out,
> there has been such a method in versions of glmer but only for very
> specific models. We will add it but right now we are concentrating on
> other issues in the development.  We should point out that Laplace
> versus adaptive Gauss-Hermite is related to the approximation of the
> log-likelihood.  In some ways I think that reliable optimization of
> the approximate log-likelihood is more important and that is an area
> where R is not strong.  Far too much optimization code is covered by
> licenses that are not compatible with R's license and the pickings for
> Open Source optimization code are somewhat slim.  I wish I had access
> to some of the optimizers that SAS uses but we don't so we make use of
> what we do have.
> 
> 
> > How to repond to this? In http://glmm.wikidot.com/faq the Laplace estimation
> is said to be less accurate than Gaus-Hermite quadrature or MCMC methods but
> is the difference in estimates such that the results are not valid? Should we
> validate the results by running different packages ? Undoubtly we will find
> differences so what results to report?
> > What answer might convince the reviewer?
> >
> > Thanks,
> > Hans
> >
> >        [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




More information about the R-sig-mixed-models mailing list