[R-sig-ME] lmer() vs. lme() gave different variance component estimates

Mon Sep 20 08:35:05 CEST 2010

On 09/18/2010 12:09 PM, Jarrod Hadfield wrote:
> Both converge to the same answer on Fedora and Mac (Mac sessionInfo() below)
[snip]
> lmer(score~trt+(1|id/eye),dat, verbose=T, start=list(matrix(.81),matrix(.57)))
> 
>   1:     1304.7519:  2.42844  1.75398
>    2:     776.66857:  6.30805  0.00000
>    3:     657.56008:  7.73875  0.00000
>    4:     514.89384:  10.7783 0.000123881
>    5:     460.61372:  13.2310 0.00140974
>    6:     434.88979:  15.6779 0.00908817
>    7:     426.68762:  17.5388 0.0316599
>    8:     424.89288:  18.6843 0.0743135
>    9:     424.71437:  19.1269 0.132248
>   10:     424.70495:  19.2146 0.203365
>   11:     424.68492:  19.3175 0.420732
[snip]
>>> l <- lmer(score~trt+(1|id/eye),dat,
>> start=list(matrix(.81),matrix(.57)), verbose=T)
>>   0:     2226.5319: 0.810000 0.570000
>>   1:     1304.7507:  2.42844  1.75398
>>   2:     776.66839:  6.30805  0.00000
>>   3:     657.56008:  7.73875  0.00000
>>   4:     514.89378:  10.7783  0.00000
>>   5:     460.61371:  13.2310  0.00000
>>   6:     434.88972:  15.6779 1.52026e-08
>>   7:     426.68822:  17.5387 6.57868e-08
>>   8:     424.89428:  18.6843 1.41900e-07
>>   9:     424.71823:  19.1258 2.47916e-07
>>  10:     424.71356:  19.2075 3.75766e-07
>>  11:     424.71354:  19.2122 5.37851e-07
>>  12:     424.71354:  19.2122 7.55161e-07

This would appear to be the crux. Notice how in both cases the 2nd
parameter gets thrown to (near-) zero, but on the 64-bit machine, it
manages to claw  itself back, in 32 bits it is struggling as well, but
not fast enough so that convergence is declared prematurely.

I'm not quite up to speed on the current implementation of lmer, but I
guess that it is still using the log-Cholesky representation, and I
suspect that the above is displaying a general weakness of
log-parametrizations: By introducing singularities at the boundary of
the parameter space it turns an otherwise perfectly well-behaved
likelihood into one with extensive very flat regions.

In this case, the REML likelihood can be explicitly written as a product
of three  chi-square terms with a linear parametrization of their scale
parameters, profiled over one of the parameters, and with an explicit
formula for the maximum to boot. Someone with a large piece of paper and
sufficient time on their hand should be able to map the situation out in
extensive detail.

(I'm also continually fascinated by the fact that so large differences
come up between 32-bit and 64-bit platforms and I can't quite escape
from the suspicion that somewhere in our code, we have an unintended
platform dependence. However, I don't think that is the main point here.)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com