[R-sig-ME] Julia vs nlme vs lme4 implementation of fitting linear mixed models

Thu Oct 16 18:57:16 CEST 2014

Thanks for the question.  It is a good exercise for me to answer it so that
I can get things straight in my mind.  I am doing this off the top of my
head, relying on my all-too-faulty memory.  Corrections and clarifications
are welcome.  Ben, should we archive such information somewhere?

The basics:
nlme:
- developed in mid-1990's by Pinheiro and Bates
- documented in P & B (2000, Springer) and several papers
- implemented in R and C using the .Call interface and R's C API.
- fits linear mixed-effects models and nonlinear mixed-effects models
- allows for additional variance and or correlation structures in the
conditional     distribution of the response, given the random effects.
- multiple formula model specification with PDMat, VarCorr, ... classes.
- allows for multiple nested grouping factors.  There is a kludgy method
for fitting models with crossed grouping factors but I wouldn't push it too
hard.
- uses S3 methods and class tags (IMO "S3 classes" don't exist)
- optimization of parameter estimates via EM and Newton-Raphson
 algorithms, profiled w.r.t. residual variance only
- Internal representation uses relative precision factor and log-Cholesky
formulation.  Singular covariance matrices correspond to infinite parameter
values.
- no longer actively developed.  Maintained by R-Core primarily to adjust
to changing requirements on R packages
- no explicit methods for fitting GLMMs.
- lme with iterative reweighting is used in the glmmPQL function in the
MASS package.

lme4:
- development began in late 90's by Bates, Maechler, DebRoy, Sarkar and
others.  Current development is primarily by Bolker and Walker.
- documented in papers, vignettes.  Bates started writing a book but didn't
complete the project - some draft chapters still online.
- implemented in R, C and C++ using facilities provided by Rcpp and Eigen
- fits linear mixed-effects models and GLMMs.  Nominally has NLMM
capabilities but the implementation is not solid
- single formula model specification.  Grouping factors can be nested,
partially or fully crossed.
- uses S4 classes and methods, S3 methods, reference classes, Matrix
package.
- internal representation uses relative covariance factor and sparse
Cholesky factorization.  Optimization of profiled log-likelihood uses
general nonlinear optimizers that allow for box constraints (in practice
only require non-negativity of some of the parameters) for LMMs.  GLMMs use
PIRLS (penalized iteratively reweighted least squares) to determine the
conditional modes of the random effects with Laplace or adaptive
Gauss-Hermite approximation to the log-likelihood.  Settling on a single
robust and reliable optimizer has been difficult.
- all models use a sparse matrix representation to solve the penalized
least squares problem
- actively maintained and developed

MixedModels
- development started in 2012 by Bates
- little or no documentation outside of examples
- implemented exclusively in Julia (about 1600 lines of code)
- fits LMMs.  Development of GLMM capabilities is planned.
- single formula specification similar to lme4.  Because Julia is still a
young language not all lme4 formulas will work with MixedModels.  (Building
the formula language interpretation code from scratch is not easy.)
 Grouping factors can be nested or crossed (partially or completely).
- uses Julia methods and user-defined types.  Julia methods provide for
multiple dispatch like S4.
- internal representation (all at the Julia level) provides for several
different PLS (penalized least squares) solvers according to the
configuration of the grouping factors.
- Models with a nested sequence of grouping factors, including a single
grouping factor as a trivial special case, use dense matrix methods and
provide an analytic gradient of the profiled log-likelihood.  Similarly for
models with two crossed or nearly crossed grouping factors (think "subject"
and "item").
- Optimizers from Steven Johnson's NLopt Julia package that interfaces to
his C library, which is also used in the R packages nloptr and nloptwrap.
At present  LN_BOBYQA is used for derivative-free optimization and LD_MMA
for models that provide a gradient but switching optimizers within the
NLopt library is trivial.
- Considerable effort made to reuse storage and cut down on
allocation/garbage collection.
- Actively developed.  Maintenance hasn't been too much of an issue because
I don't think there are many users at present.

Examples of MixedModels fits are at

http://nbviewer.ipython.org/github/dmbates/slides/blob/master/2014-10-01-Ithaca/lmm%20Examples.ipynb

Because I was involved in the development of all three packages I will take
the liberty of commenting on the philosophy.

The purpose of developing nlme was for fitting nonlinear mixed-effects
models.  Linear mixed-effects models were incorporated mainly as an
iterative step in nlme.  Numerical methods used are not nearly as
sophisticated as those in lme4 and MixedModels.

lme4 was developed to provide a use-case for S4 classes and methods. The
Matrix package was initially part of lme4 then split off into a separate
package.  The numerical methods implemented in lme4 are, in my opinion,
superior to those in nlme, mainly through the use of the relative
covariance factor and the profiled log-likelihood.  These may seem like
details but to me they are very important.  The motiviation for
incorporating sparse matrix classes in the Matrix package and accessing the
CHOLMOD code was to provide a general method for fitting such models.
Using C++, Rcpp and RcppEigen was motivated by trying to provide generality
and speed.  The end result is confusing (my fault entirely) and fragile.

MixedModels was developed because I became interested in the Julia language
at the same time that I was disillusioned with at least some aspects of R.
As a new language Julia doesn't have the infrastructure of R (dataframes,
formula language, graphics, ...) but does have a clean implementation of
the parts that are available.  The most important aspect of Julia is "one
language".  You develop in the same language in which you optimize.  The
type system in Julia allows me to incorporate the different kinds of
penalized least squares solvers in what to me is a clean way, thereby
taking advantage of structural simplifications in simple, but common,
cases.  It is possible to do this in R/C++/Rcpp/EIgen but it would be a
massive headache and perhaps beyond my abilities to do it well.

Optimizing Julia code is often done at the expense of transparency.  It is
obvious what

 C = C - A*B

means when C, A and B are matrices (* means matrix multiplication in
Julia).  It is less obvious what

  BLAS.gemm!('N','N',-1.,A,B,1.,C)

means but the second form avoids taking two unnecessary copies of the
matrix C and running a couple of extra loops.  This isn't important if you
only do it once.  If you do it several dozen times in each function
evaluation in an optimization that requires tens of thousands function
evaluations, it is important.

Please let me know if this answers your question.

	[[alternative HTML version deleted]]