[R-sig-ME] Prediction/classification & variable selection

Thu May 14 22:02:18 CEST 2020

Dear Daniel,
to build upon Cesko's comment, if your problem is indeed a problem of  
"classic" model selection, the package MuMin does this by basically  
testing every combination of predictor variables:

https://cran.r-project.org/web/packages/MuMIn/MuMIn.pdf

It also takes mixed models.

Cheers, Tim

Zitat von "Voeten, C.C." <c.c.voeten using hum.leidenuniv.nl>:

> Dear Daniel,
>
> Maybe my understanding of your situation is a bit too simplistic,  
> but it sounds like you have a classic case of model selection /  
> feature selection? There are many approaches for that. The easiest  
> would be likelihood-ratio tests (or AIC, or BIC, or some other  
> criterion). Start with a full model (or as full as you can get while  
> still achieving convergence) containing all combinations of  
> predictors, remove one term, see if the model improves according to  
> your criterion... repeat until no terms are left to be eliminated.  
> There are many packages that can automate this procedure for you.  
> Another option could be lasso or ridge regression, which are  
> commonly used for feature selection in the classification  
> literature. I don't know if the lasso has been implemented for mixed  
> models, but I know that package mgcv allows you to specify ridge  
> penalties via (see the documentation related to the paraPen argument).
>
> HTH,
> Cesko
>
>> -----Original Message-----
>> From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> On
>> Behalf Of daniel.schlunegger using psy.unibe.ch
>> Sent: Thursday, May 14, 2020 6:42 PM
>> To: r-sig-mixed-models using r-project.org
>> Subject: [R-sig-ME] Prediction/classification & variable selection
>>
>> Dear people of r-sig-mixed-models using r-project.org
>>
>> My name is Daniel Schlunegger, PhD-student in Psychology at the University
>> of Bern, Switzerland.
>>
>> I’m new here and I wondered if somebody can help me.
>>
>> My goal is to predict subjects responses based on their previous  
>> responses in
>> a one-interval two-alternative forced choice auditory discrimination task
>> (Was it tone A or tone B sort of task). I’ve ran an experiment with 24
>> subjects, each subject performed 1200 trials ( = 28800 trials). There are no
>> missing values, all data is „clean“.
>>
>> The main idea of my work is:
>> 1) Take subjects’ responses
>> 2) Compute some statistics with those responses
>> 3) Use these statistics to predict the next response (in a  
>> trial-by-trial fashion)
>>
>> Goal: Prediction / Classification (binary outcome)
>>
>> From three different learning models I derived three predictors. More
>> clearly, three different sets of predictors. Within each set, there are n
>> predictors (normally distributed). The predictors within each set  
>> are of very
>> similar nature. I need a model with three predictors, one of each set of
>> predictors. From each set of predictors, there is one predictor in  
>> the model:
>>
>> y ~ predictor1_n + predictor2_n + predictor3_n
>>
>>
>> Problem: Theoretically it is possible (or rather probable) that for  
>> each subject
>> a different combination of predictors (e.g. predictor1_2 + predictor2_1 +
>> predictor3_3 vs. predictor1_1 + predictor2_2 + predictor3_3) results in a
>> better classification accuracy. On the other hand I would like to keep the
>> model as simple as possible. Let’s say, having the same three predictors for
>> all subjects, while accounting for differences with a random intercept (1 |
>> subject) or random intercept and random slope.
>>
>> I’ve seen a lot of work where they perform subject-level and group-level
>> analyses, but I think that’s actually not correct, right?
>>
>> Do you have any suggestions how to do this the proper way? I assume that
>> just running n * n * n different GLMM’s (lme4::glmer()) is not the  
>> proper way
>> to do it. Because that is what I did so far, and then checked what  
>> combination
>> gives me the best prediction.
>>
>> (I have another dataset from a slightly different version of the experiment.
>> This dataset contains 91200 trials from 76 subjects, if number of  
>> observations
>> is an issue here)
>>
>> Thanks for considering my request.
>>
>> Kind regards,
>> Daniel Schlunegger
>> _______________________________________________
>> R-sig-mixed-models using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-- 
Tim Richter-Heitmann
Universität Bremen