[R-sig-ME] lmer() computational performance

Wed Jun 29 23:12:29 CEST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/29/2011 05:00 PM, zubin wrote:
> Hello, running a mixed model in the package LME4, lmer()
> 
> Panel data, have about 322 time periods and 50 states, total data set is
> approx 15K records and about 20  explanatory variables.  Not a very
> large data set. 
> 
> We run random intercepts as well as random coefficients for about 10 of
> the variables, the rest come in as fixed effects.   We are running into
> a wall of time to execute these models.  As we add more random coefficients, it seems to slow down pretty dramatically. 
> 
> A sample specification of all random effects:
> 
> lmer(Y ~ 1 + (x_078 + x_079 + growth_st_index  +
>             retail_st_index + Natl + econ_home_st_index +
>             econ_bankruptcy +  index2_HO  + GPND_ST  | state),
>             data = newData, doFit = TRUE)
> 
> Computation time is near 15 minutes.
> System        ELAPSED        User
> 21.4            888.63        701.74
> 
> 
> Does anyone have any ideas on way's to speed up lmer(), as well any
> parallel implementations, or approaches/options to reduce computation time?
> 
> Running on a Linux 8 core machine, 16GB of ram, R 2.12
> 

  It would be helpful to mention your previous post on r-help and quote
my answer, so that any further discussion can comment on it/not repeat
points that I have already made ...  (lightly edited below)

  [1 post to r-sig-mixed-models]
 (2) Although it's certainly not large for a standard regression model,
I'm not really sure whether this counts as "large" in the mixed/
multilevel model world.    For comparison, the 'Chem97' dataset in
the mlmRev package is 31022 observations x 8 variables x 2280 blocks and
is described as "relatively large" -- so the raw data matrix is about
the same size (twice as long, half as wide) but there are many more
blocks.  (And as you point out above, the number of REs is what's
killing you -- the example in ?Chem97 includes only 2 independent RE
(i.e. 2 intercept-only terms).
  (3) Fitting 10 random effects (including the intercept)
is very ambitious, it leads to the estimation of a 10x10 correlation
matrix ... I don't know whether you know that's what you're doing, or
whether you need the full correlation matrix.  You can split it up into
independent blocks (in the extreme, 10 uncorrelated random effects) by
specifying the REs as separate chunks, e.g.

(1|state) + (0+x_078|state) + (0|x_079|state) + ...

(see some
of the examples in the lmer documentation).  (lme, in the nlme package,
offers more flexibility in specifying structured correlation matrices
of different types, but will in general be slower than lme4 -- but
perhaps it would be faster to fit a structured (simpler) model you're
happy with using lme than the full unstructured model using lmer)
  (4) the development version of lme4, lme4a, *might* be faster (but
is less well tested/less stable).
  (5) do you have alternatives?  I haven't worked with data sets this
size myself, but anecdotes on the r-sig-mixed-models list suggest that
lmer is faster than most alternatives ... ?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4LlT0ACgkQc5UpGjwzenNWhACeJ0P5YEJFvquc9aRCdj0z4Wu0
Q2EAmQHSS/+LiMVf7vOawhmLvyE08gxD
=AhrG
-----END PGP SIGNATURE-----