[R-sig-ME] Suggestions on how to correct a misapprehension?

Wed Dec 14 18:45:52 CET 2022

Thank you to all who commented and especially to JP van Paridon for editing
the Wikipedia article.

I think the new version is a great improvement in accuracy.

There is a subtle, but important, difference between lme4/MixedModels and
other implementations of mixed-effects models in terms of the scale on
which the parameters being directly optimized are defined.  I don't know
how to phrase this in an easily understood way.  I'll try to explain it and
perhaps someone can suggest a more coherent description.

I mentioned that lme4/MixedModels profile the log-likelihood with respect
to elements of a  "relative covariance factor".  The easiest way to
understand what that phrase means is to consider a case with scalar random
effects. Let σ₁² be the variance of the random effects.  The factor of this
variance is the standard deviation, σ₁. (In the case of vector-valued
random effects we use the Cholesky factor of the positive semi-definite
variance-covariance matrix.)  The relative covariance factor is σ₁ / σ
where σ is the standard deviation of the per-observation or "residual"
noise.

The "profiling" of the objective means that, given a value of σ₁ / σ, we
can determine the optimal value of the log-likelihood over all possible
values of β, the fixed-effects coefficients, and σ², the residual variance,
with a direct (i.e. non-iterative) calculation of solving a penalized least
squares problem.

In contrast, most other implementations optimize with respect to the
elements of the precision matrix, G⁻¹, of the random effects because that
is the way that Henderson's mixed-model equations were originally written.

The choice of whether to optimize on the "standard deviation scale" or the
"precision scale" determines the nature of the boundary cases.  It can be
shown that on the standard deviation scale it is possible, and not
uncommon, to converge to a singular covariance matrix (σ₁ / σ →0 for scalar
random effects) but not to an infinite standard deviation.  On the
precision scale the situation is reversed - a precision of 0 is not
possible but infinite precision is possible.  As a result the optimization
on the standard deviation scale is more stable than on the precision
scale.  It is much easier to set a lower bound of 0 on a parameter like σ₁
/ σ than to try to converge on a precision parameter that, technically,
will diverge to infinity.

This is a subtle and technical distinction and thus probably not suitable
for a Wikipedia explanation.

I was asked why I'm not a fan of numpy and scipy.  I find them unattractive
from the software-engineering point of view.  As far as I know, and I am
not a Python expert, Python offers only a methods-within-class-definitions
style of object-oriented programming.  R and Julia allow for defining
methods for generic functions with single (S3 methods in R) or multiple
(Julia and S4 methods in R) dispatch from a function call to an
implementation method.  For fitting statistical models or for numerical
linear algebra the "generics + classes = methods" style is a much better
fit, as John Chambers was wise enough to see when he created the S3 system
for S.

Consider the Matrix package in R or the LinearAlgebra package in Julia.
They each define many many different types of matrices and methods for
operations on combinations of these types.  For example, there are 141
different methods for `mul!` (matrix multiplication overwriting one of its
arguments with the product) in Julia's LinearAlgebra package, and that is
without counting the methods for `lmul!`, `rmul!`, `*` (like R's `%*%`),
etc.
That is, there are several hundred specializations of the general concept
of "multiplication of linear algebra objects"  and this is how you make
linear algebra fast - by specializing the generic concept on the types of
all of the operands.

Now look at
https://docs.scipy.org/doc//scipy/reference/linalg.html#module-scipy.linalg
to see how it is done in SciPy.  There is no dispatch - just a collection
of awkwardly-named, stand-alone functions that cover only a handful of the
possible cases.  We might as well be back writing Fortran subroutines - at
least Lapack did some forms of dispatch by coding the types of arguments in
the subroutine name.

It just seems to me that numpy and scipy are pretty weak tea for scientific
programming in what is apparently the world's most popular programming
language.

On Wed, Dec 14, 2022 at 3:17 AM Martin Maechler <maechler using stat.math.ethz.ch>
wrote:

> >>>>> Jeroen van Paridon via R-sig-mixed-models
> >>>>>     on Wed, 14 Dec 2022 02:11:46 +0000 writes:
>
>     > Hi Ben, It seems like Python's statsmodels might not be
>     > using EM either, which would mean that exactly none of the
>     > four stats packages listed in the original Wikipedia
>     > article actually use EM.
>
> That's not correct, either:
>
> nlme  clearly uses *BOTH*  EM (initial steps) *and*
> Newton-Raphson (==> non-sense in the SAS docu you cited).
>
> Note the reference on lme's help page,
>
>   Lindstrom, M.J. and Bates, D.M. (1988) "Newton-Raphson and EM
>   Algorithms for Linear Mixed-Effects Models for Repeated-Measures
>   Data", Journal of the American Statistical Association, 83, 1014--1022.
>
> which even has both algorithms in the title.
>
>
>     > My edits were a mix of Phillip Alday's thoughts on the
>     > matter and mine; if I can work out exactly what
>     > statsmodels is doing I'll make further changes. (And I'm
>     > open to suggestions about anything else that might be good
>     > to add!
>
> I've also amended Wikipedia pages (in my name) since 2006, as I
> now looked up...
>
> The current entry about lme4 / Mixed Models is still "grossly" misleading,
> in my view.  Notably as it basically relates everything to
> Henderson's MM equations -- which were phantastic when stated
> 63 years ago.
>
> The main idea of Doug Bates' approach (in lme4, MixedModels) --
> which Doug explained nicely -- namely that you can profile out much of the
> parameter space (beta, sigma) via PLS while only needing to optimize
> (the PLS resulting profiled likelihood) over the var-cov
> parameters (theta) is still not all mentioned.
>
> Martin
>
>
>     > Cheers,
>
>     > JP ________________________________ From:
>     > R-sig-mixed-models
>     > <r-sig-mixed-models-bounces using r-project.org> on behalf of
>     > Ben Bolker <bbolker using gmail.com> Sent: Tuesday, December 13,
>     > 2022 7:38:16 PM To: r-sig-mixed-models using r-project.org
>     > <r-sig-mixed-models using r-project.org> Subject: Re: [R-sig-ME]
>     > Suggestions on how to correct a misapprehension?
>
>     >    FWIW I can see that JP van Paridon has already gotten
>     > us most of the way there.
>
>     > FWIW I don't think SAS uses EM either, at least as far as
>     > I can tell:
>
>     >
> https://documentation.sas.com/doc/en/statcdc/14.2/statug/statug_mixed_details58.htm
>
>     > PROC MIXED uses a ridge-stabilized Newton-Raphson
>     > algorithm to optimize either a full (ML) or residual
>     > (REML) likelihood function. The Newton-Raphson algorithm
>     > is preferred to the EM algorithm (Lindstrom and Bates
>     > 1988). PROC MIXED profiles the likelihood with respect to
>     > the fixed effects and also with respect to the residual
>     > variance whenever it appears reasonable to do so. The
>     > residual profiling can be avoided by using the NOPROFILE
>     > option of the PROC MIXED statement. PROC MIXED uses the
>     > MIVQUE0 method (Rao 1972; Giesbrecht 1989) to compute
>     > initial values.
>
>     > On 2022-12-13 8:25 p.m., Stuart Luppescu wrote:
>     >> On 12/14/22 02:39, Douglas Bates wrote:
>     >>> I have never gone through the process of proposing an
>     >>> edit in a Wikipedia article.  As I understand it I would
>     >>> need to create a login etc.  Would anyone who does have
>     >>> such a login be willing to propose an edit and save me
>     >>> the steps?
>     >>
>     >> I was in this situation recently. I found AN ERROR in
>     >> Wikipedia and resolved to fix it, but the Wikipedia
>     >> admins wouldn't let me. I had been logged into my VPN on
>     >> DigitalOcean, and they disable all attempts to edit
>     >> articles from DigitalOcean. I had to log out of the VPN,
>     >> and erase my Wikipedia cookies before I could correct the
>     >> entry. In any case, I think I have a Wikipedia login, but
>     >> I couldn't remember it so I just did the edit as
>     >> "guest". If you want to be able to talk about your edit
>     >> with other people post hoc, I think you might need a
>     >> login. Otherwise, just do it without a login.
>     >>
>
>     > _______________________________________________
>     > R-sig-mixed-models using r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>     >   [[alternative HTML version deleted]]
>
>     > _______________________________________________
>     > R-sig-mixed-models using r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]