[R-sig-ME] cross validation of glmmLasso

Mollie Brooks mo|||eebrook@ @end|ng |rom gm@||@com
Tue Dec 13 14:33:10 CET 2022


To follow up with what I found (and then I promise to stop spamming you all)…

There are now versions of both lmmen::cv.glmmLasso and cv.glmmLasso::cv.glmmLasso that work with multiple random effects, but they aren’t merged into the original GitHub repositories yet. 

It seems that lmmen::cv.glmmLasso finds the best penalty parameter lambda using BIC, whereas cv.glmmLasso::cv.glmmLasso will do K-fold cross validation to find lambda. Both are described in Appendix A of Groll and Tutz 2014.

cheers,
Mollie
 

> On 12 Dec 2022, at 13.31, Mollie Brooks <mollieebrooks using gmail.com> wrote:
> 
> I found that Ben Bolker is already working on this problem https://github.com/yonicd/lmmen/compare/master...bbolker:lmmen:master <https://github.com/yonicd/lmmen/compare/master...bbolker:lmmen:master>
> 
> I’ll follow up there. 
> 
> The lesson I’m learning with glmmLasso is to always check forks on GitHub to see if someone already started working on the problem. Thanks Ben!
> 
> Cheers,
> Mollie
> 
> 
>> On 12 Dec 2022, at 13.09, Mollie Brooks <mollieebrooks using gmail.com <mailto:mollieebrooks using gmail.com>> wrote:
>> 
>> Following up with a reproducible example…
>> Both of the models below can be fit in glmmLasso, but only the one with a single RE (lm1) can be cross-validated with lmmen.
>> 
>> library(glmmLasso)
>> data("soccer")
>> library(lmmen)
>> 
>> soccer[,c(4,5,9:16)]<-scale(soccer[,c(4,5,9:16)],center=TRUE,scale=TRUE)
>> soccer<-data.frame(soccer)
>> 
>> lm1 <- glmmLasso(points ~ transfer.spendings + ave.unfair.score 
>>                  + ball.possession + tackles 
>>                  + ave.attend + sold.out, rnd = list(team=~1), 
>>                  lambda=50, data = soccer)
>> 
>> summary(lm1)
>> 
>> lm2 <- glmmLasso(points ~ transfer.spendings + ave.unfair.score 
>>                  + ball.possession + tackles 
>>                  + ave.attend + sold.out, rnd = list(team=~1, pos=~1), 
>>                  lambda=50, data = soccer)
>> 
>> summary(lm2)
>> 
>> 
>> cv.lm1 <- cv.glmmLasso(form.fixed= points ~ transfer.spendings + ave.unfair.score 
>>                     + ball.possession + tackles 
>>                     + ave.attend + sold.out, 
>>                     form.rnd = list(team=~1), 
>>                     lambda=seq(5,250, by=5), dat = soccer)
>> 
>> cv.lm2 <- cv.glmmLasso(form.fixed= points ~ transfer.spendings + ave.unfair.score 
>>                     + ball.possession + tackles 
>>                     + ave.attend + sold.out, 
>>                     form.rnd = list(team=~1, pos=~1), 
>>                     lambda=seq(5,250, by=5), dat = soccer)
>> 
>> This is the error (below). I was getting a similar error with cv.glmmLasso:: cv.glmmLasso until I made the change described in the pull request from my earlier email. For that package, it was to be due to assuming that q_start was a scalar (as in the case of a single RE).
>> 
>> > cv.lm2=cv.glmmLasso(form.fixed= points ~ transfer.spendings + ave.unfair.score 
>> +                     + ball.possession + tackles 
>> +                     + ave.attend + sold.out, 
>> +                     form.rnd = list(team=~1, pos=~1), 
>> +                     lambda=seq(5,250, by=5), dat = soccer)
>> Error in diag(diag(q_start), sum(s)) : 
>>   'nrow' or 'ncol' cannot be specified when 'x' is a matrix
>> In addition: There were 50 or more warnings (use warnings() to see the first 50)
>> > traceback()
>> 8: stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")
>> 7: diag(diag(q_start), sum(s))
>> 6: est.glmmLasso.RE(fix = fix, rnd = rnd, data = data, lambda = lambda, 
>>        family = family, final.re = final.re, switch.NR = switch.NR, 
>>        control = control)
>> 5: est.glmmLasso(fix, rnd, data = data, lambda = lambda, family = family, 
>>        switch.NR = switch.NR, final.re = final.re, control = control)
>> 4: glmmLasso::glmmLasso(fix = as.formula(form.fixed), rnd = form.rnd, 
>>        data = dat, lambda = lambda[opt], switch.NR = FALSE, final.re = TRUE, 
>>        control = list(start = Delta.start[opt, ], q_start = Q.start.base))
>> 3: withCallingHandlers(expr, warning = function(w) if (inherits(w, 
>>        classes)) tryInvokeRestart("muffleWarning"))
>> 2: suppressWarnings({
>>        final <- glmmLasso::glmmLasso(fix = as.formula(form.fixed), 
>>            rnd = form.rnd, data = dat, lambda = lambda[opt], switch.NR = FALSE, 
>>            final.re = TRUE, control = list(start = Delta.start[opt, 
>>                ], q_start = Q.start.base))
>>        final
>>    })
>> 1: cv.glmmLasso(form.fixed = points ~ transfer.spendings + ave.unfair.score + 
>>        ball.possession + tackles + ave.attend + sold.out, form.rnd = list(team = ~1, 
>>        pos = ~1), lambda = seq(5, 250, by = 5), dat = soccer)
>> 
>> 
>> Cheers,
>> Mollie
>> 
>>> On 12 Dec 2022, at 11.50, Mollie Brooks <mollieebrooks using gmail.com <mailto:mollieebrooks using gmail.com>> wrote:
>>> 
>>> I’m interested in doing cross validation on GLMMs fit with LASSO. I found two functions for doing this: lmmen::cv.glmmLasso and cv.glmmLasso::cv.glmmLasso. With a small amount of digging, it looks like lmmen has been used more since it was on CRAN in the past and it shows up in a thesis and a working paper on Google scholar.
>>> 
>>> Both packages only work with a single random effect and I need 2 RE for my data set. I managed to fix that problem in the cv.glmmLasso package (https://github.com/thepira/cv.glmmLasso/pull/18 <https://github.com/thepira/cv.glmmLasso/pull/18>). Making lmmen work with multiple random effects is a little harder.
>>> 
>>> Another problem with lmmen is that the example from the helpfile isn’t working for me. So I’m not sure I should put time into making it work with multiple random effects.
>>> > tmp=cv.glmmLasso(initialize_example(seed=1))
>>> Error in rep(0, d.size) : invalid 'times' argument
>>> In addition: Warning message:
>>> In cv.glmmLasso(initialize_example(seed = 1)) : NAs introduced by coercion 
>>> 
>>> I’m wondering if anyone else has already been down this rabbit hole and can offer advice. Is there another package (or random code lying around) for doing cross validation on GLMMs with LASSO that is more thoroughly tested and currently in use?
>>> 
>>> Cheers,
>>> Mollie
>>> 
>>> 
>> 
> 


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list