[R-sig-ME] Speed estimation for lmer?

Thu Sep 11 17:09:19 CEST 2008

Adam D. I. Kramer wrote:
> Hi,
> 
>     I'm about to estimate what I expect to be a fairly involved model,
> like this one:
> 
> l <- lmer(y ~ x1*x2*x3 + (x1*x2*x3|grp) )
> 
> ...the data set has 3,232,255 rows, for about 18000 grps, each of which
> has around
> 700 observations; x1, x2, x3 are continuous variables.
> 
> Is there any way I can estimate how long this run will take? Obviously this
> depends on things like memory, processor, etc....but perhaps I could run it
> on 5 groups and then multiply the amount of time it takes, or something
> like
> that?

  I don't know exactly how it will scale, but I would guess offhand that
it wouldn't be much worse than linear in number of points and in number
of groups (???).  I would suggest you try it for 5, 10, 20, and 100
groups and extrapolate from there ...  it does seem frighteningly big
to me.

> Also, given this information, is there some faster way to run the model? In
> theory, I'd be interested in systematically checking which random effects I
> could drop, but not if it would take weeks. Some prior posts to this list
> (which I have only been actively reading since yesterday) suggest that lmer
> is likely faster than lmer2, but there doesn't seem to be much
> discussion on
> the speed of various modeling functions (lme, lmer, lmer2, glmer, glmmPLQ,
> etc.).

  Roughly speaking: lmer and lmer2 aren't (I think) different any more,
they were different branches of the same software.  They should both
be much faster than lme.  glmer (from lme4) and glmmPQL (from nlme)
should not be necessary unless you have binomial, Poisson, etc. data
rather than normally distributed responses.

  good luck

    Ben Bolker