[R-sig-ME] Julia vs nlme vs lme4 implementation of fitting linear mixed models

Fri Oct 17 14:34:10 CEST 2014

Dear Professor Bates

This is a great help - thank you. I may have some follow-on questions, 
in which case I shall reply to the list but your response below gives me 
plenty to occupy myself with for now.

Regards
Robert Long

On 16/10/2014 17:57, Douglas Bates wrote:
> Thanks for the question.  It is a good exercise for me to answer it so
> that I can get things straight in my mind.  I am doing this off the top
> of my head, relying on my all-too-faulty memory.  Corrections and
> clarifications are welcome.  Ben, should we archive such information
> somewhere?
>
> The basics:
> nlme:
> - developed in mid-1990's by Pinheiro and Bates
> - documented in P & B (2000, Springer) and several papers
> - implemented in R and C using the .Call interface and R's C API.
> - fits linear mixed-effects models and nonlinear mixed-effects models
> - allows for additional variance and or correlation structures in the
> conditional     distribution of the response, given the random effects.
> - multiple formula model specification with PDMat, VarCorr, ... classes.
> - allows for multiple nested grouping factors.  There is a kludgy method
> for fitting models with crossed grouping factors but I wouldn't push it
> too hard.
> - uses S3 methods and class tags (IMO "S3 classes" don't exist)
> - optimization of parameter estimates via EM and Newton-Raphson
>   algorithms, profiled w.r.t. residual variance only
> - Internal representation uses relative precision factor and
> log-Cholesky formulation.  Singular covariance matrices correspond to
> infinite parameter values.
> - no longer actively developed.  Maintained by R-Core primarily to
> adjust to changing requirements on R packages
> - no explicit methods for fitting GLMMs.
> - lme with iterative reweighting is used in the glmmPQL function in the
> MASS package.
>
> lme4:
> - development began in late 90's by Bates, Maechler, DebRoy, Sarkar and
> others.  Current development is primarily by Bolker and Walker.
> - documented in papers, vignettes.  Bates started writing a book but
> didn't complete the project - some draft chapters still online.
> - implemented in R, C and C++ using facilities provided by Rcpp and Eigen
> - fits linear mixed-effects models and GLMMs.  Nominally has NLMM
> capabilities but the implementation is not solid
> - single formula model specification.  Grouping factors can be nested,
> partially or fully crossed.
> - uses S4 classes and methods, S3 methods, reference classes, Matrix
> package.
> - internal representation uses relative covariance factor and sparse
> Cholesky factorization.  Optimization of profiled log-likelihood uses
> general nonlinear optimizers that allow for box constraints (in practice
> only require non-negativity of some of the parameters) for LMMs.  GLMMs
> use PIRLS (penalized iteratively reweighted least squares) to determine
> the conditional modes of the random effects with Laplace or adaptive
> Gauss-Hermite approximation to the log-likelihood.  Settling on a single
> robust and reliable optimizer has been difficult.
> - all models use a sparse matrix representation to solve the penalized
> least squares problem
> - actively maintained and developed
>
> MixedModels
> - development started in 2012 by Bates
> - little or no documentation outside of examples
> - implemented exclusively in Julia (about 1600 lines of code)
> - fits LMMs.  Development of GLMM capabilities is planned.
> - single formula specification similar to lme4.  Because Julia is still
> a young language not all lme4 formulas will work with MixedModels.
>   (Building the formula language interpretation code from scratch is not
> easy.)  Grouping factors can be nested or crossed (partially or completely).
> - uses Julia methods and user-defined types.  Julia methods provide for
> multiple dispatch like S4.
> - internal representation (all at the Julia level) provides for several
> different PLS (penalized least squares) solvers according to the
> configuration of the grouping factors.
> - Models with a nested sequence of grouping factors, including a single
> grouping factor as a trivial special case, use dense matrix methods and
> provide an analytic gradient of the profiled log-likelihood.  Similarly
> for models with two crossed or nearly crossed grouping factors (think
> "subject" and "item").
> - Optimizers from Steven Johnson's NLopt Julia package that interfaces
> to his C library, which is also used in the R packages nloptr and
> nloptwrap.  At present  LN_BOBYQA is used for derivative-free
> optimization and LD_MMA for models that provide a gradient but switching
> optimizers within the NLopt library is trivial.
> - Considerable effort made to reuse storage and cut down on
> allocation/garbage collection.
> - Actively developed.  Maintenance hasn't been too much of an issue
> because I don't think there are many users at present.
>
> Examples of MixedModels fits are at
>
> http://nbviewer.ipython.org/github/dmbates/slides/blob/master/2014-10-01-Ithaca/lmm%20Examples.ipynb
>
> Because I was involved in the development of all three packages I will
> take the liberty of commenting on the philosophy.
>
> The purpose of developing nlme was for fitting nonlinear mixed-effects
> models.  Linear mixed-effects models were incorporated mainly as an
> iterative step in nlme.  Numerical methods used are not nearly as
> sophisticated as those in lme4 and MixedModels.
>
> lme4 was developed to provide a use-case for S4 classes and methods. The
> Matrix package was initially part of lme4 then split off into a separate
> package.  The numerical methods implemented in lme4 are, in my opinion,
> superior to those in nlme, mainly through the use of the relative
> covariance factor and the profiled log-likelihood.  These may seem like
> details but to me they are very important.  The motiviation for
> incorporating sparse matrix classes in the Matrix package and accessing
> the CHOLMOD code was to provide a general method for fitting such
> models.  Using C++, Rcpp and RcppEigen was motivated by trying to
> provide generality and speed.  The end result is confusing (my fault
> entirely) and fragile.
>
> MixedModels was developed because I became interested in the Julia
> language at the same time that I was disillusioned with at least some
> aspects of R.  As a new language Julia doesn't have the infrastructure
> of R (dataframes, formula language, graphics, ...) but does have a clean
> implementation of the parts that are available.  The most important
> aspect of Julia is "one language".  You develop in the same language in
> which you optimize.  The type system in Julia allows me to incorporate
> the different kinds of penalized least squares solvers in what to me is
> a clean way, thereby taking advantage of structural simplifications in
> simple, but common, cases.  It is possible to do this in
> R/C++/Rcpp/EIgen but it would be a massive headache and perhaps beyond
> my abilities to do it well.
>
> Optimizing Julia code is often done at the expense of transparency.  It
> is obvious what
>
>   C = C - A*B
>
> means when C, A and B are matrices (* means matrix multiplication in
> Julia).  It is less obvious what
>
>    BLAS.gemm!('N','N',-1.,A,B,1.,C)
>
> means but the second form avoids taking two unnecessary copies of the
> matrix C and running a couple of extra loops.  This isn't important if
> you only do it once.  If you do it several dozen times in each function
> evaluation in an optimization that requires tens of thousands function
> evaluations, it is important.
>
> Please let me know if this answers your question.