[R-sig-ME] Starting point for modeling a within-subject design

Thu Oct 17 21:06:44 CEST 2024

Dear Ben,

Such an amazing answer! Please see my follow ups indicated by >>.

(1) is the mediator considered to be measured/observed with or without
error?

>> Likely measured with error, both due to an imprecise tool and how it was
collected! But none of these variables are included in the model.

(2) are you interested in making inferences about the true value of the
mediator?

>> True values of the mediator are not of interest. Only its estimated
effect on outcome is of interest.

[Regarding the correlation between the condition's binary levels], they're
not random variables ... Or do you mean the **effects** of the level of
each condition?

>> No, they're not random variables! We have a finite set of levels for
condition (complex/simple) ... Yes, I mean the effects of each level is
correlated with that of the other level, due to the same participants being
in both simple and complex conditions.

identifiability is in principle less of an issue with Bayesian
methods (since _in theory_ the sampler should be able to integrate over
all possible combinations of the confounded/unidentifiable variables),
but in practice this will also cause problems unless your priors are
relatively informative.

>> I wonder what would be a frequentist alternative to the Baysian model I
sketched? Relatedly, are any random effects even for 'subject' warranted
(given two observations per subject), if not then it seems nlme:gls() with
"correlation=corSymm(form= ~1|subject)" and "weights=varIdent (form=
~1|condition)" may better account for the correlation between effects of
condition than "(condition|subject)" does .

But I'm not sure how gls() could eventually estimate the fixed effect of
mediator on outcome?

On Wed, Oct 16, 2024 at 1:56 PM Ben Bolker <bbolker using gmail.com> wrote:

>
>
> On 10/7/24 22:19, Simon Harmel wrote:
> > Dear Mixed-Effects Experts,
> >
> > Suppose a causal mechanism where a normally distributed outcome is
> impacted
> > by condition (a binary factor variable), and a mediator (a
> > continuous variable) that sits between the condition and outcome.
> >
> > Here, all subjects get to taste both conditions by counterbalancing. That
> > is, based on chance, some will first get one condition at one point, and
> > later, they will get another condition.
> >
> > In a sense, this is a within-subject design with a data structure like:
> > subject condition   outcome  mediator
> >   1          complex       25.1        6
> >   1          simple          11.0        4
> >   2          complex       24.3        7
> >   2          simple          12.2        3
> > QUESTIONS:
> > 1) Can we think of this model as a multivariate model where the outcome
> and
> > mediator are indeed 2 correlated DVs that are impacted by condition?
>
>    I don't know.  Some questions to consider: (1) is the mediator
> considered to be measured/observed with or without error? (2) are you
> interested in making inferences about the true value of the mediator? My
> guess would be that at least in the linear/Gaussian case, the estimates
> of the direct and combined effects wouldn't be biased by treating the
> observed value of the mediator as the true value (although I guess the
> uncertainty would be underestimated).
>
> >
> > 2) Given that the same participants get both levels of condition, should
> > levels of condition in each subject be correlated at the latent level as
> in
> > (condition | subject) and/or possibly at the residual level as in nlme::
> > corClasses?
> >
>
>     The levels of condition are categorical (complex/simple), right? And
> they're not random variables ... Or do you mean the **effects** of the
> level of each condition?
>
>    You could in principle fit (condition|subject), but you'll have an
> identifiability problem if you only have two observations per subject
> [as in the 'starling' example here:
> https://bbolker.github.io/morelia_2018/notes/mixedlab.html]
>
>    If variability varied by condition, you could estimate that ...
>
>
> > 3) Is there any frequentist software to analyze such data, and if not,
> does
> > the following bayesian model sound like a good "starting point"?
> >
> > library(brms)
> > mediator_formula <- bf(mediator ~ condition + (condition | subject))
> >
> > outcome_formula <- bf(outcome ~  condition +  mediator + (condition |
> > subject))
> >
> > model <- brm(mediator_formula + outcome_formula,
> >                 data = DATA,
> >                 seed = 123,
> >                 chains = 4,
> >                 iter = 10000)
> >
> > Thank you all for your expertise,
> > Simon
> >
>
>    identifiability is in principle less of an issue with Bayesian
> methods (since _in theory_ the sampler should be able to integrate over
> all possible combinations of the confounded/unidentifiable variables),
> but in practice this will also cause problems unless your priors are
> relatively informative.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]