[R-sig-ME] Modelling football matches

Sat Dec 17 01:26:13 CET 2022

On 2022-12-15 3:02 p.m., Jorge Teixeira wrote:
> Thank you, Ben.
> 
> Yes, indeed there are many more things that could be added - I was 
> trying to discuss a more fundamental structure.
> 
> Game_part was related to the fact that each game has part 1 and part 2.
> 
> 1) I agree it makes sense to have game_part as fixed effect too, with 
> this result
> 
> lmer(distance ~ stage + *game_part* + (1|player) + (1|game/game_part), 
> data=my_data)
> 
> 2) As for the random slopes, my question was that I believe the 
> variation by game and game_part might be different across players. Can 
> random slopes account for that?

   That's a little challenging with 'typical' mixed model machinery. 
Models where both the mean (location) and variance (scale) vary 
according to covariates or groups are called 'location-scale' models. 
There is a category in the mixed models task view 
<https://cran.r-project.org/web/views/MixedModels.html> that covers 
this, but I'm not sure whether the scale is allowed to vary as a *random 
effect* -- it certainly isn't in glmmTMB.
> 
> 3) For outcomes such as relative average heart rate, that are bounded by 
> 100%, do you recommend a specific family of models?

   Provided it doesn't go to exactly 0 or 100%, beta is the natural choice.

> 
> Thanks once again.
> 
> Date: Thu, 15 Dec 2022 12:12:16 -0500
> From: Ben Bolker <bbolker using gmail.com <mailto:bbolker using gmail.com>>
> To: r-sig-mixed-models using r-project.org 
> <mailto:r-sig-mixed-models using r-project.org>
> Subject: Re: [R-sig-ME] Modelling football matches
> Message-ID: <ca745d94-ac01-3422-cd66-0d85058d8936 using gmail.com 
> <mailto:ca745d94-ac01-3422-cd66-0d85058d8936 using gmail.com>>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
> 
> 
>      For a positive-valued variable like distance you might want to
> consider a log-linear model (lmer(log(distance) ~ ...) or a Gamma GLMM
> (glmer(distance  ~ ..., family = Gamma(link="log"))
> 
>     I believe the full model here would use random slopes ('slopes' in
> the broad sense since stage is a categorical variable) of stage
> (stage|player) - (stage|game) won't work because each game is only one
> stage.
> 
>     I'm not sure about the definition of 'game_part', but you might want
> to add a *fixed* effect of game_part as well as the 'game_part within
> game' nested random effect.
> 
>     There's probably a huge amount of covariate information you could add
> (e.g. player's position, player's age), probably other stuff too (random
> effect of team?)
> 
> Jorge Teixeira <jorgemmtteixeira using gmail.com 
> <mailto:jorgemmtteixeira using gmail.com>> escreveu no dia quinta, 15/12/2022 
> à(s) 15:17:
> 
>     Hi.
> 
>     1) Assuming that most are somewhat familiar with football, and that
>     it is world cup time, what do you think of this model to compare
>     differences in distance covered between stages (group stage vs final
>     stage)?
> 
>     lmer(distance ~ stage + (1|player) + (1|game/game_part), data=my_data)
> 
>     2) In theory, which random slopes do you think should be added, if any?
> 
>     Thank you.
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.