[R-sig-ME] Time series with multinomial outcomes

Mon Oct 17 16:10:33 CEST 2022

I’m working on a times series problem with multinominal outcomes. The specific problem is time series of pollen counts in paleoecological data where the response variable is the number of pollen grains in a core sample at a given depth identified to N species, where N > 2. I can code this as a state-space model for an N-dimensional vector autoregressive process (VAR(1)) with a multinomial measurement (updating) equation, and then fit it using penalized quasi-likelihood. This works surprisingly well, but it is a pretty old-school approach.

Does anybody know if this model can be fit with existing R packages?

Many thanks!

__________________________
Anthony R. Ives (he/him/his)
UW-Madison
459 Birge Hall
608-262-1519

From: R-sig-mixed-models <r-sig-mixed-models-bounces using r-project.org> on behalf of Ben Bolker <bbolker using gmail.com>
Date: Sunday, October 16, 2022 at 5:17 PM
To: r-sig-mixed-models using r-project.org <r-sig-mixed-models using r-project.org>
Subject: Re: [R-sig-ME] Bradley Terry GLMM in R ?
   PS  my advice about hacking the Z matrix works best/most easily if
(1) the *dimensions* of the Z matrix (and length of theta, the
random-effects parameter vector) are the same in the model constructed
by `lFormula` and the modified model. (It's possible even if not, but
you would have to hack some other components of the output of `lFormula`
...)

On 2022-10-16 5:49 p.m., Shira Mitchell wrote:
> Update: Dr Heather Turner <https://www.heatherturner.net/> (author of
> BradleyTerry2) suggested the hglm package, which unlike lme4 allows you to
> specify generic design matrices (no longer constrained to lme4 formulas !)
> Results look really similar to INLA so far. Yay !
>
> On Sun, Oct 16, 2022 at 2:19 PM Shira Mitchell <shiraqotj using gmail.com> wrote:
>
>> Super helpful !  Thank you so much !
>>
>> Out of curiosity, is there a way to fit this type of Bradley-Terry model
>> in lme4 ? lme4 formulas include random effect via syntax:
>> https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
>> "(expr | factor). The expression expr is evaluated as a linear model
>> formula, producing a model matrix following the same rules used in standard
>> R modeling functions (e.g., `lm` or `glm`). The expression factor is
>> evaluated as an `R` factor. One way to think about the vertical bar
>> operator is as a special kind of interaction between the model matrix and
>> the grouping factor. This interaction ensures that the columns of the model
>> matrix have different effects for each level of the grouping factor."
>>
>> So (expr | factor) is X_expr * alpha_factor.
>>
>> So naively writing ~ (1 | m_1) + (1 | m_2) is alpha_{m_1}^{(1)} +
>> alpha_{m_2}^{(2)}, twice as many parameters as what we want which is
>> alpha_{m_1} - alpha_{m_2}.
>>
>> But then see this stackexchange:
>>
>> https://stats.stackexchange.com/questions/483833/opposing-effects-in-lme4-formulae-bradley-terry-model
>> "I could just make a design matrix, where player 1 gets the value 1, and
>> player 2 gets the value −1. However, unless I'm missing a trick, this would
>> require having a separate column for each player, and plugging each player
>> column's name into the formula"
>>
>> But suppose we create columns for all m = 1,...,M messages:
>>
>> A_m = 1 if m = m_1
>>             -1 if m = m_2
>>              0 otherwise
>>
>> I think then ~ (A_1 + ... + A_M  | m_1) is alpha_{m_1}^{(m_1)} -
>> alpha_{m_1}^{(m_2)}, also not what we would want.
>>
>> Back to INLA. Suppose we now want to add random message-specific slopes
>> for variable X_i in addition to random message-specific intercepts:
>>
>> P[i chooses m_1] = logit^-1 (beta_0 + (alpha_{m_1} - alpha_{m_2}) +
>> (beta_{m_1} - beta_{m_2})X_i)
>>
>> alpha_1,...,alpha_M ~ N(0,sigma_intercept)
>> beta_1,...,beta_M ~ N(0,sigma_slope)
>>
>> I see some resources about this, but nothing super comprehensive. Any
>> advice where to look for complete documentation ?
>>
>> https://groups.google.com/g/r-inla-discussion-group/c/iQELaQF8M9Q/m/q7f4-W8YQksJ
>> (
>> https://becarioprecario.bitbucket.io/inla-gitbook/ch-multilevel.html#multilevel-models-for-longitudinal-data
>> https://rpubs.com/corey_sparks/431920
>> https://avianecologist.com/2016/10/05/multilevel-models/
>>
>> Here is what we did:
>>
>> data$w_X = -data$X
>> data$m_1_beta = data$m_1
>> data$m_2_beta = data$m_2
>>
>> inla(depvar ~  f(m_1, model="iid", values = issues) +
>>                             f(m_2, w, copy = "m_1") +
>>                             f(m_1_beta, X, model="iid", values = issues) +
>>                             f(m_2_beta, w_X, copy = "m_1_beta"),
>>                           family="binomial",
>>                           data=data)
>>
>>
>>
>>
>> On Mon, Oct 10, 2022 at 4:24 AM Thierry Onkelinx <thierry.onkelinx using inbo.be>
>> wrote:
>>
>>> Dear Shira,
>>>
>>> - in a formula object means remove that object from the formula. Use a
>>> weight of -1 instead.
>>>
>>> f(home, model = "iid")) + f(away, w = -1, copy = "home")
>>>
>>> Best regards,
>>>
>>> ir. Thierry Onkelinx
>>> Statisticus / Statistician
>>>
>>> Vlaamse Overheid / Government of Flanders
>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>> AND FOREST
>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>>> thierry.onkelinx using inbo.be
>>> Havenlaan 88 bus 73, 1000 Brussel
>>> www.inbo.be<http://www.inbo.be>
>>>
>>>
>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to say
>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of data.
>>> ~ John Tukey
>>>
>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>
>>> <https://www.inbo.be>
>>>
>>>
>>> Op vr 7 okt. 2022 om 23:36 schreef Shira Mitchell <shiraqotj using gmail.com>:
>>>
>>>> Thanks so much, Thierry ! This is great.
>>>>
>>>> This works except that I cannot subtract because:
>>>> f(home, model = "iid")) - f(away, copy = "home")
>>>>
>>>> just drops the second term. Apologies that I'm not super familiar with
>>>> INLA syntax yet.
>>>>
>>>>
>>>>
>>>> On Fri, Oct 7, 2022 at 10:19 AM Thierry Onkelinx <
>>>> thierry.onkelinx using inbo.be> wrote:
>>>>
>>>>> Hi Shira,
>>>>>
>>>>> I fit such models with the INLA package (https://www.r-inla.org/). The
>>>>> trick is to define two random effects but force their parameter estimates
>>>>> to be identical.
>>>>>
>>>>> The code would contain something like f(home, model = "iid")) + f(away,
>>>>> copy = "home"). Meaning home ~ N(0, sigma_beta_i) and home[i] = away[i]
>>>>>
>>>>> Best regards,
>>>>>
>>>>> ir. Thierry Onkelinx
>>>>> Statisticus / Statistician
>>>>>
>>>>> Vlaamse Overheid / Government of Flanders
>>>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>>>> AND FOREST
>>>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>>>>> thierry.onkelinx using inbo.be
>>>>> Havenlaan 88 bus 73, 1000 Brussel
>>>>> www.inbo.be<http://www.inbo.be>
>>>>>
>>>>>
>>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>>> To call in the statistician after the experiment is done may be no more
>>>>> than asking him to perform a post-mortem examination: he may be able to say
>>>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>> The combination of some data and an aching desire for an answer does
>>>>> not ensure that a reasonable answer can be extracted from a given body of
>>>>> data. ~ John Tukey
>>>>>
>>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>>>
>>>>> <https://www.inbo.be>
>>>>>
>>>>>
>>>>> Op vr 7 okt. 2022 om 15:00 schreef Shira Mitchell <shiraqotj using gmail.com
>>>>>> :
>>>>>
>>>>>> We want to fit Bradley-Terry-style GLMM models in R. We looked into:
>>>>>>
>>>>>>
>>>>>> https://cran.r-project.org/web/packages/BradleyTerry2/vignettes/BradleyTerry.pdf
>>>>>> and
>>>>>> http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
>>>>>>
>>>>>> We have voter-specific variables x that influence which political
>>>>>> message
>>>>>> (i vs j) wins for them:
>>>>>>
>>>>>> logit[pr(i beats j | person with covariate x)] = lambda_i - lambda_j +
>>>>>> (beta_i - beta_j) x
>>>>>>
>>>>>> We then model parameters as random effects:
>>>>>> lambda_i ~ N(0, sigma_lambda)
>>>>>> beta_i ~ N(0, sigma_beta)
>>>>>>
>>>>>> Is there a way to do this in R ? We do this in TensorFlow in Python by
>>>>>> directly specifying design matrices with the 0,-1,1 or 0,-x,x entries.
>>>>>> However, I do not see how to do this in R using lme4, BradleyTerry2,
>>>>>> mgcv,
>>>>>> etc.
>>>>>>
>>>>>> Thanks so much,
>>>>>> Shira
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
 > E-mail is sent at my convenience; I don't expect replies outside of
working hours.

_______________________________________________
R-sig-mixed-models using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]