[R-sig-ME] Prediction/classification & variable selection

Fri May 15 14:53:02 CEST 2020

Dear Daniel,

Please keep the list in cc.

I know exactly what you mean, writing a loop and manually extracting AIC/BIC... that is exactly how I ended up writing package buildmer, which automates precisely that. Tim already suggested MuMIn to you as well; you would need to see which of the many packages (e.g. I can also think of lmerTest::step) serves your needs best. Buildmer's advantage is that it first tries to build up the maximal model from zero before doing backward elimination, so if your maximal model isn't capable of converging, buildmer will automatically give you the largest subset that does converge (plus removing terms that are not significant in backward elimination based on LRT/AIC/BIC/etc). I can't comment on the other packages, you would really need to experiment which of them works best for your own purposes.

Re multiple testing: in my view, for hypothesis testing based on p-values, you are only using one model, which you just happened to have done some pruning on first. In that sense, you wouldn't need to apply any corrections. However it is also well known that model selection will amplify spurious effects, and I also do see where authors like Hastie & Tibshirani are coming from, saying in their book that the standard errors of a pruned model are invalid because they don't take into account the selection procedure. Ultimately, this is a highly contested issue and you'd best either follow whatever is customary in your field, or use some kind of simulation-based approach to obtain p-values that do take into account the selection procedure. (I wouldn't really know how, and I am not aware of any literature giving a clear recipe for that, but maybe others have ideas here.)

For lasso/ridge, you definitely would not need to perform any manual corrections, as in those cases selection takes place as part of the fitting process as well, so the p-values will be correct in any case (well, barring of course the general issue of p-values in mixed models...).

Best,
Cesko

> -----Original Message-----
> From: daniel.schlunegger using psy.unibe.ch <daniel.schlunegger using psy.unibe.ch>
> Sent: Friday, May 15, 2020 1:35 PM
> To: Voeten, C.C. <c.c.voeten using hum.leidenuniv.nl>
> Subject: AW: Prediction/classification & variable selection
> 
> Dear Cesko
> 
> Thank you for your reply. Based on your comment I feel confident that I'm
> not completely off the track. Because this is what I did before, looking at
> information criteria. Thank you also for the hint that there exist packages to
> do that. Because before i did this with a loop and extracting AIC, BIC, etc. I
> know horrible.
> 
> I also wondered if I have to correct for multiple testing in one way or the
> other since I'm testing all possible combinations? Thank you also for the tip
> with lasso and ridge regression, that also crossed my mind lately. I'm not too
> familiar yet with these two though.
> 
> Thank you for taking the time.
> 
> Best wishes,
> Daniel
> 
> ________________________________________
> Von: Voeten, C.C. [c.c.voeten using hum.leidenuniv.nl]
> Gesendet: Donnerstag, 14. Mai 2020 21:41
> An: Schlunegger, Daniel (PSY); r-sig-mixed-models using r-project.org
> Betreff: RE: Prediction/classification & variable selection
> 
> Dear Daniel,
> 
> Maybe my understanding of your situation is a bit too simplistic, but it sounds
> like you have a classic case of model selection / feature selection? There are
> many approaches for that. The easiest would be likelihood-ratio tests (or AIC,
> or BIC, or some other criterion). Start with a full model (or as full as you can
> get while still achieving convergence) containing all combinations of
> predictors, remove one term, see if the model improves according to your
> criterion... repeat until no terms are left to be eliminated. There are many
> packages that can automate this procedure for you. Another option could be
> lasso or ridge regression, which are commonly used for feature selection in
> the classification literature. I don't know if the lasso has been implemented
> for mixed models, but I know that package mgcv allows you to specify ridge
> penalties via (see the documentation related to the paraPen argument).
> 
> HTH,
> Cesko
> 
> > -----Original Message-----
> > From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org>
> On
> > Behalf Of daniel.schlunegger using psy.unibe.ch
> > Sent: Thursday, May 14, 2020 6:42 PM
> > To: r-sig-mixed-models using r-project.org
> > Subject: [R-sig-ME] Prediction/classification & variable selection
> >
> > Dear people of r-sig-mixed-models using r-project.org
> >
> > My name is Daniel Schlunegger, PhD-student in Psychology at the
> University
> > of Bern, Switzerland.
> >
> > I’m new here and I wondered if somebody can help me.
> >
> > My goal is to predict subjects responses based on their previous responses
> in
> > a one-interval two-alternative forced choice auditory discrimination task
> > (Was it tone A or tone B sort of task). I’ve ran an experiment with 24
> > subjects, each subject performed 1200 trials ( = 28800 trials). There are no
> > missing values, all data is „clean“.
> >
> > The main idea of my work is:
> > 1) Take subjects’ responses
> > 2) Compute some statistics with those responses
> > 3) Use these statistics to predict the next response (in a trial-by-trial
> fashion)
> >
> > Goal: Prediction / Classification (binary outcome)
> >
> > From three different learning models I derived three predictors. More
> > clearly, three different sets of predictors. Within each set, there are n
> > predictors (normally distributed). The predictors within each set are of very
> > similar nature. I need a model with three predictors, one of each set of
> > predictors. From each set of predictors, there is one predictor in the model:
> >
> > y ~ predictor1_n + predictor2_n + predictor3_n
> >
> >
> > Problem: Theoretically it is possible (or rather probable) that for each
> subject
> > a different combination of predictors (e.g. predictor1_2 + predictor2_1 +
> > predictor3_3 vs. predictor1_1 + predictor2_2 + predictor3_3) results in a
> > better classification accuracy. On the other hand I would like to keep the
> > model as simple as possible. Let’s say, having the same three predictors for
> > all subjects, while accounting for differences with a random intercept (1 |
> > subject) or random intercept and random slope.
> >
> > I’ve seen a lot of work where they perform subject-level and group-level
> > analyses, but I think that’s actually not correct, right?
> >
> > Do you have any suggestions how to do this the proper way? I assume that
> > just running n * n * n different GLMM’s (lme4::glmer()) is not the proper
> way
> > to do it. Because that is what I did so far, and then checked what
> combination
> > gives me the best prediction.
> >
> > (I have another dataset from a slightly different version of the experiment.
> > This dataset contains 91200 trials from 76 subjects, if number of
> observations
> > is an issue here)
> >
> > Thanks for considering my request.
> >
> > Kind regards,
> > Daniel Schlunegger
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models