[R-sig-ME] Making lme4 faster for specific case of sparse x

Douglas Bates bates at stat.wisc.edu
Tue Aug 9 18:33:27 CEST 2016

On Tue, Aug 9, 2016 at 8:36 AM Patrick Miller <pmille13 at nd.edu> wrote:

> Thanks for that clarification.  In my situation, the effect of each
> predictor in X was allowed to vary by a single grouping variable. The lmer
> formula is something like the following:
> y ~ 1 + X1 + X2 + X3 + ... + ( 1 + X1 + X2 + X3 + ... | id)

Okay - that's not the same as X == Z but we'll let that slide.

It is extremely unlikely that you will be able to fit such a model and get
a meaningful result.  Suppose that you have p columns in the fixed-effects
model matrix, X ,and k levels of the id factor.  The covariance matrix of
the random effects will be p by p with p*(p + 1) / 2 distinct elements to
estimate.  It is difficult to estimate large covariance matrices with any
accuracy.  You would need k to be very, very large to have any hope of
doing so.

To make it worthwhile using a sparse representation of X you would need p
to be large - in the hundreds or thousands - which would leave you trying
to estimate tens of thousands of covariance parameters.

It is just not on.

If you feel you must fit this model because of the "keep it maximal" advice
of Barr et al. (2013), remember that they reached that conclusion on the
basis of a simulation of a model with one covariate.  That is, they were
comparing fitting 1 by 1 covariance matrix with fitting a 2 by 2 covariance
matrix.  To conclude on the basis of such a small simulation that everyone
must always use the maximal model, even when it would involve tens or
hundreds of covariance parameters, is quite a leap.

On Mon, Aug 8, 2016 at 6:08 PM, Douglas Bates <bates at stat.wisc.edu> wrote:

> If X == Z don't you have problems with estimability?  It seems that mle
>> would always correspond to all random effects being zero.
>> Perhaps I misunderstand the situation.  Could you provide a bit more
>> detail on how it comes about that X == Z?
>> On Mon, Aug 8, 2016 at 5:01 PM Patrick Miller <pmille13 at nd.edu> wrote:
>>> Hello,
>>> For my dissertation, I'm working on extending boosted decision trees to
>>> clustered data.
>>> In one of the approaches I'm considering, I use *lmer* to estimate random
>>> effects within each gradient descent iteration in boosting. As you might
>>> expect, this is computationally intensive. However, my intuition is that
>>> this step could be made faster because my use case is very specific.
>>> Namely, in each iteration, *X = Z*, and *X* is a sparse matrix of 0s and
>>> 1s
>>> (with an intercept).
>>> I was wondering if anyone had suggestions or (theoretical) guidance on
>>> this
>>> problem. For instance, is it possible that this special case permits
>>> faster
>>> optimization via specific derivatives? I'm not expecting this to be
>>> implemented in lmer or anything, and I'm happy to work out a basic
>>> implementation myself for this case.
>>> I've read the vignette on speeding up the performance of lmer, and
>>> setting calc.derivs
>>> = FALSE resulted in about a 15% performance improvement for free, which
>>> was
>>> great. I was just wondering if it was possible to go further.
>>> Thanks in advance,
>>> - Patrick
>>> --
>>> Patrick Miller
>>> Ph.D. Candidate, Quantitative Psychology
>>> University of Notre Dame
>>>         [[alternative HTML version deleted]]
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> --
> Patrick Miller
> Ph.D. Candidate, Quantitative Psychology
> University of Notre Dame

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list