[R-sig-ME] Making lme4 faster for specific case of sparse x

Wed Aug 10 01:28:16 CEST 2016

 A few more thoughts:

  I'm not quite sure why a sparse representation of X is only worthwhile
when p is very large.  I haven't done the arithmetic on the storage
required in the sparse representation (column pointers, locations,
values) ... so I decided to do an experiment

> z <- data.frame(f=factor(sample(8,size=10000,replace=TRUE)))
> m <- model.matrix(~f,data=z)
> m2 <- Matrix::sparse.model.matrix(~f,data=z)
> print(object.size(m),units="Mb")
1.1 Mb
> print(object.size(m2),units="Mb")
0.8 Mb

  This is only about a 35% improvement, but (while not an order of
magnitude) that might not be trivial ...

  Depending on what stays fixed between gradient descent steps, you
might be able to save time by updating individual components of the
stuff returned by lFormula() (see ?modular), and especially mkReTrms().

  You can probably save time by switching to the BOBYQA implementation
in nloptr.

  If you *do* have a large variance-covariance matrix, you might be able
to specialize to a diagonal, compound-symmetry, or factor-analytic
variance-covariance matrix (see Steve Walker's lme4ord package on github)

On 16-08-09 12:33 PM, Douglas Bates wrote:
> On Tue, Aug 9, 2016 at 8:36 AM Patrick Miller <pmille13 at nd.edu> wrote:
> 
>> Thanks for that clarification.  In my situation, the effect of each
>> predictor in X was allowed to vary by a single grouping variable. The lmer
>> formula is something like the following:
>>
>> y ~ 1 + X1 + X2 + X3 + ... + ( 1 + X1 + X2 + X3 + ... | id)
>>
> 
> Okay - that's not the same as X == Z but we'll let that slide.
> 
> It is extremely unlikely that you will be able to fit such a model and get
> a meaningful result.  Suppose that you have p columns in the fixed-effects
> model matrix, X ,and k levels of the id factor.  The covariance matrix of
> the random effects will be p by p with p*(p + 1) / 2 distinct elements to
> estimate.  It is difficult to estimate large covariance matrices with any
> accuracy.  You would need k to be very, very large to have any hope of
> doing so.
> 
> To make it worthwhile using a sparse representation of X you would need p
> to be large - in the hundreds or thousands - which would leave you trying
> to estimate tens of thousands of covariance parameters.
> 
> It is just not on.
> 
> If you feel you must fit this model because of the "keep it maximal" advice
> of Barr et al. (2013), remember that they reached that conclusion on the
> basis of a simulation of a model with one covariate.  That is, they were
> comparing fitting 1 by 1 covariance matrix with fitting a 2 by 2 covariance
> matrix.  To conclude on the basis of such a small simulation that everyone
> must always use the maximal model, even when it would involve tens or
> hundreds of covariance parameters, is quite a leap.
> 
> On Mon, Aug 8, 2016 at 6:08 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> 
>> If X == Z don't you have problems with estimability?  It seems that mle
>>> would always correspond to all random effects being zero.
>>>
>>> Perhaps I misunderstand the situation.  Could you provide a bit more
>>> detail on how it comes about that X == Z?
>>>
>>> On Mon, Aug 8, 2016 at 5:01 PM Patrick Miller <pmille13 at nd.edu> wrote:
>>>
>>>> Hello,
>>>>
>>>> For my dissertation, I'm working on extending boosted decision trees to
>>>> clustered data.
>>>>
>>>> In one of the approaches I'm considering, I use *lmer* to estimate random
>>>> effects within each gradient descent iteration in boosting. As you might
>>>> expect, this is computationally intensive. However, my intuition is that
>>>> this step could be made faster because my use case is very specific.
>>>> Namely, in each iteration, *X = Z*, and *X* is a sparse matrix of 0s and
>>>> 1s
>>>> (with an intercept).
>>>>
>>>> I was wondering if anyone had suggestions or (theoretical) guidance on
>>>> this
>>>> problem. For instance, is it possible that this special case permits
>>>> faster
>>>> optimization via specific derivatives? I'm not expecting this to be
>>>> implemented in lmer or anything, and I'm happy to work out a basic
>>>> implementation myself for this case.
>>>>
>>>> I've read the vignette on speeding up the performance of lmer, and
>>>> setting calc.derivs
>>>> = FALSE resulted in about a 15% performance improvement for free, which
>>>> was
>>>> great. I was just wondering if it was possible to go further.
>>>>
>>>> Thanks in advance,
>>>>
>>>> - Patrick
>>>>
>>>> --
>>>> Patrick Miller
>>>> Ph.D. Candidate, Quantitative Psychology
>>>> University of Notre Dame
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>
>>
>> --
>> Patrick Miller
>> Ph.D. Candidate, Quantitative Psychology
>> University of Notre Dame
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>