[R-sig-ME] Speed estimation for lmer?

Thu Sep 11 23:12:03 CEST 2008

On Thu, 11 Sep 2008, Douglas Bates wrote:

> On Thu, Sep 11, 2008 at 10:09 AM, Ben Bolker <bolker at ufl.edu> wrote:
>> Adam D. I. Kramer wrote:
>>> Hi,
>>>
>>>     I'm about to estimate what I expect to be a fairly involved model,
>>> like this one:
>>>
>>> l <- lmer(y ~ x1*x2*x3 + (x1*x2*x3|grp) )
>>>
>>> ...the data set has 3,232,255 rows, for about 18000 grps, each of which
>>> has around 700 observations;
>
> Are you sure?
>
>> 3232255/18000
> [1] 179.5697

Touche. By "around" I really meant "up to," and by 700 I meant 200 (I'm
doing some preliminary analysis before all of the cases are in). I apologize
for this mistake.

>>>  x1, x2, x3 are continuous variables.
>>>
>>> Is there any way I can estimate how long this run will take? Obviously
>>> this depends on things like memory, processor, etc....but perhaps I
>>> could run it on 5 groups and then multiply the amount of time it takes,
>>> or something like that?
>
> I would start with a sample of groups and all observations for each of the
> groups in the sample.  Something like 1000 groups should not take a huge
> amount of time (easily under an hour) if you simplify the random effects
> specification.  I would start with the
>
> gsamp <- with(mydata, sample(levels(grp), 1000))
> datasamp <- subset(mydata, grp %in% gsamp)
> system.time(fm1 <- lmer(y ~ x1 * x2 * x3 + (1|grp), datasamp, verbose = TRUE))
> system.time(fm2 <- lmer(y ~ x1 * x2 * x3 + (x1 + x2 + x3|grp),
> datasamp, verbose = TRUE))
> system.time(fm3 <- lmer(y ~ x1 * x2 * x3 + (x1 * x2 * x3|grp),
> datasamp, verbose = TRUE))
>
> The reason that I suggest using verbose = TRUE is because you will be able
> to see the progress of the iterations and to get a feeling for the amount
> of time per iteration.

This alone is a fantastic thing to know! It's almost like a progress bar.

And you were right about it not taking long. For fm2:
    user  system elapsed 
289.868  22.335 312.860 
...for the 1000 person subgroup on the additive random model.

...I then started estimating fm3, went to lunch, came back,
and it is still running.

> Having a large number of observations and of groups will increase the
> amount of time per iteration but my guess is that it would be more-or-less
> linear in the number of observations.  I wouldn't expect that it would be
> worse than quadratic in the number of observations.

Once fm3 is fit, I'll see how increasing the number of groups to 2k, 3k
increases the user time for fitting fm1 and fm2.

> However, increasing the number of random effects per group could cost you
> substantially because that increases the number of parameters to be
> estimated as part of the nonlinear optimization. 
> The iteration output for fm1 will show one parameter being optimized. 
> Many other parameters are involved in the model but their estimates are
> calculated directly, given the one parameter which happens to be the ratio
> of the standard deviation of the random effects to the standard deviation
> of the residuals.  The iteration output for fm2 will show 10 parameters in
> the nonlinear optimization, corresponding to four variances and 6
> covariances of the random effects.  I'm not sure exactly how many
> parameters will be in the nonlinear optimization for fm3 but it will be a
> large number.  I would certainly do some graphical exploration before I
> embarked on trying to fit a model of that complexity.

For fm3, each line looks like this:

   0:     516824.68: 0.120888 0.0364642 0.0300891 0.0361707 0.00780154
0.0103620 0.00786704 0.00195518  0.00000  0.00000  0.00000  0.00000  0.00000
0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000

...I assume that this means that for iteration 0, those are the estimated
parameters, and there are 36 of them. It appears to be completing an
iteration every 15-20 seconds. The full interaction should estimate 8
variances and sum(1:7) = 28 covariances = 36 parameters total, so that
checks out. 36 parameters is 3.6 times as many as fm2 required, so
312.860*3.6 = 1126, or about 18 minutes to fit...far less than it's taken.

36^2 / 10^2 = 13 times longer, or about 68 minutes to fit fm3...we'll see
when we get there.

>>  Roughly speaking: lmer and lmer2 aren't (I think) different any more,
>> they were different branches of the same software.  They should both be
>> much faster than lme.  glmer (from lme4) and glmmPQL (from nlme) should
>> not be necessary unless you have binomial, Poisson, etc. data rather than
>> normally distributed responses.
>
> Ben is correct.  There are functions lmer and lmer2 in the current lme4
> package but lmer2 is just a stub that turns around and calls lmer.  If
> your response y is on a continuous scale then it should be lmer that you
> use.

Indeed.

> It will help to have a machine with a lot of memory so you don't end up
> swapping.  I would recommend using a machine running 64-bit Linux (we
> don't have a 64-bit Windows version of R because we are waiting on freely
> available compilers).

R is running in 64-bit mode on a 64-bit linux machine with 16GB of memory
total. fm3 is taking about 11% of memory, and the full model (which I'm
loathe to abort) about 50% (yikes). R was compiled with -O3 and -mtune
flags.

> You may also want to try with an accelerated BLAS.

This version of R is linked to a version of ATLAS (3.8.2) that I compiled
myself before building R on this machine.

> However, I would check both with and without accelerated BLAS if you have
> a multi-core processor.  Sometimes a multithreaded accelerated BLAS can
> make lmer run slower, not faster. This is because the calls to the BLAS
> are predominantly for small matrices and the communications overhead for
> multithreaded versions more than offsets the performance gain from using
> multiple cores.

Interesting...I hadn't thought of that. The processor is 4-core, and R was
linked against the multithreaded BLAS. This may be covered in a faq
somewhere, but is there an easy way to swap out the BLAS that R is
accessing? Perhaps renaming my libraries is sufficient?

Thanks very much for your help!

--Adam