[R-meta] Coding Longitudinal Studies

Sat Oct 9 01:18:11 CEST 2021

Dear Danielle,

To be clear, your current model has the following random-effect
specification (notice I have numbered each term):

random = list(
    1-         ~1                 | StudyNUM,
    2-         ~Time_NUM | interaction(StudyNUM, Cohort),
    3-         ~1                 | interaction(StudyNUM, Cohort, ExTreat),
    4-         ~1                 | interaction(StudyNUM, Cohort,
ExTreat,Time_NUM))

Now let's get to your questions.

random = list(~ 1|StudyNUM), # variation in effects between studies,
each study has their own intercept?

Yes, each study can have its own average (study-level average effect),
and thus we expect some variation among them at this level.

~ Time_NUM | interaction(StudyNUM, Cohort), # variation in effects due
to cohorts (i.e study design: cross-over or independent cohort) within
each study (within_study heterogeneity)?

I'm not following you here, my understanding was that "cohort" refers
to independent samples of participants studied independently in each
study. But it seems by cohort you mean two totally different study
designs employed by each study. Please clarify?

That aside, here true effect sizes at different time points are
correlated with each other for each cohort within a study. You can
further specify a structure beyond the default one (struct = "CS") for
this correlation (perhaps anything from "HAR" to "AR" to "UN" or even
"HCS" depending on the fit).

~ 1 || interaction(StudyNUM, Cohort, ExTreat)), # variation in effect
is due to treatments within cohorts of individual studies? What is the
"||" indicate? Does this mean that they are not the same, i.e
different treatments?

First, `||` is currently reserved for when you set the `struct="GEN"`
(and perhaps a few other undocumented structures), it is meant to act
like struct = "DIAG" for a categorical variable that appears before
`||` i.e., no correlation between levels of "some variable" or kill
the correlation between slopes and intercepts given a continuous
variable that appears before `||`.

Second, you have not specified that "some variable" before `||` so
essentially, it is not relevant to your model and is completely
ignored.

This entire term assumes that true effect sizes aggregated at the
'ExTreat" level within each cohort can vary around their respective
cohort-level aggregates. Thus, this captures variation within a given
cohort in your data.

~ 1 | interaction(StudyNUM, Cohort, ExTreat, Time_NUM)), #variation in
effect due to timing  of their measurements within different
treatments within cohorts within studies (repeated  measures)?
 struct = c("UN","UN") )

First, struct = c("UN","UN") is not relevant to this random term,
because there are no variables before two instances of `|` and thus
correlations among the levels of two "nothing" variables are
completely ignored.

This entire term assumes that individual true effect sizes (that is
true representation of your observed effect sizes in each row) can
vary around their specific ExTreat-level aggregate within each
ExTreat. Thus, this captures variation within a given ExTreat in your
data.

Now, why should there be variation in a given ExTreat in your data?
Because, there might be studies that have subjected their participants
in each ExTreat to multiple measurements perhaps over time (repeated
measurements), or on different outcomes (math, reading etc.).

2) when coding the dataframe should I give each individual row (effect
size) a ID? As the timepoint ID don't represent the same thing (e.g.,
time == 1 in study 1 isn't the same thing as time == 1 in study 2). Or
as I have nested the timepoints within the study this should be ok?

Adding ID is generally a helpful practice (e.g., for removing the
outliers). But for your data:

 rowID  =  interaction(StudyNUM, Cohort, ExTreat, Time_NUM)

In the case of time, it is fine if time == 1 in study 1 isn't the same
thing as time == 1 in study 2. However, it DOES matter whether the
"amount of time" passed up to say time 1 in study vs. that in study 2
are the same or not.

IFF you add time as a categorical variable (otherwise you run into
multi-collinearity), then you can add a control variable to account
for that in your data. If you can't find that info. in the majority of
studies, then, at least control for the total length of each study (in
weeks, months etc.).

3) If I get 0 for the variance components (I.e sigma^2.3) would this
indicate that I do not need to include these in the model as it is not
explaining any variability?

Your last random-effect term has shown to be an overfit as it has
returned 0 variability within a given Extreat in your data (you either
don't have some many studies whose Extreat has repetition in it OR if
you do, there is no much variation in them to demand an additional
level). As such, you can remove ~1 | interaction(StudyNUM, Cohort,
ExTreat,Time_NUM) from your random-effect specification.

Kind regards,
Reza

On Thu, Oct 7, 2021 at 12:45 AM Danielle Hiam
<danielle.hiam using deakin.edu.au> wrote:
>
> Thank you Reza, this has been incredibly insightful and helpful.
>
> However if I may clarify a couple of points, regarding the random intercept coding. My questions are denoted by # in each line
> random = list(~ 1|StudyNUM), # variation in effects between studies, each study has their own intercept?
> ~ Time_NUM | interaction(StudyNUM, Cohort), # variation in effects due to cohorts (i.e study design: cross-over or independent cohort) within each study (within_study heterogeneity)?
> ~ 1 || interaction(StudyNUM, Cohort, ExTreat)), # variation in effect is due to treatments within cohorts of individual studies? What is the "||" indicate? Does this mean that they are not the same, i.e different treatments?
> ~ 1 | interaction(StudyNUM, Cohort, ExTreat, Time_NUM)), #variation in effect due to timing  of their measurements within different treatments within cohorts within studies (repeated  measures)?
>  struct = c("UN","UN") )
>
> 2) when coding the dataframe should I give each individual row (effect size) a ID? As the timepoint ID don't represent the same thing (e.g., time == 1 in study 1 isn't the same thing as time == 1 in study 2). Or as I have nested the timepoints within the study this should be ok?
>
> 3) If I get 0 for the variance components (I.e sigma^2.3) would this indicate that I do not need to include these in the model as it is not explaining any variability? Example below.
> Variance Components:
>              estim    sqrt  nlvls  fixed                                            factor
> sigma^2.1  12.4092  3.5227     11     no                     StudyNUM
> sigma^2.2   4.0556  2.0138     15     no                  interaction(StudyNUM, Cohort, ExTreat)
> sigma^2.3   0.0000  0.0000     33     no                 interaction(StudyNUM, Cohort, ExTreat, Time_NUM)
>
> outer factor: interaction(StudyNUM, Cohort) (nlvls = 13)
> inner factor: Time_NUM                      (nlvls = 3)
>
>             estim    sqrt  k.lvl  fixed  level
> tau^2.1    0.2131  0.4616     15     no      1
> tau^2.2    0.0333  0.1824     11     no      2
> tau^2.3    0.0005  0.0232      7     no      3
>
> Any guidance will be appreciated.
>
> -----Original Message-----
> From: Reza Norouzian <rnorouzian using gmail.com>
> Sent: Sunday, 12 September 2021 2:41 PM
> To: Luke Martinez <martinezlukerm using gmail.com>
> Cc: Danielle Hiam <danielle.hiam using deakin.edu.au>; r-sig-meta-analysis using r-project.org
> Subject: Re: [R-meta] Coding Longitudinal Studies
>
> Yes, if the codes don't represent the same thing (e.g., treat == 1 in study 1 isn't the same thing as treat == 1 in study 2), then you can keep treat, cohort etc. in the random-part if needed, but not in the fixed part of the model.
>
> Also, if, say the codes for treat across the studies represent the same thing, then, one may get more precise (depending on how correlated the levels are and how varied they are across the studies) average effects for treatment levels, if they add correlated random effects (and Some_V_matrix) for the treatment levels:
>
> rma.mv(yi ~ treat , V = Some_V_matrix, random =  ~ treat | study, ...) with a struct = "UN", "HCS", or "CS".
>
> As I said, there are many possibilities to think about in the absence of data and research questions.
>
> BTW, where I gave a few examples of simpler models, the middle model seems to be a copy of the first one. Here is what I intended to include there:
>
> rma.mv(yi, V = Some_V_matrix, random =  list(~ time | study, ~1| interaction(study, cohort, treat, time)), struct = "HAR")
>
> Best,
> Reza
>
> On Sat, Sep 11, 2021 at 11:00 PM Luke Martinez <martinezlukerm using gmail.com> wrote:
> >
> > Dear Reza,
> >
> > That is assuming that cohorts and treatments mean the same thing
> > across the studies? What if that is not the case?
> >
> > Thanks,
> > Luke
> >
> > On Sat, Sep 11, 2021 at 5:51 PM Reza Norouzian <rnorouzian using gmail.com> wrote:
> > >
> > > Dear Danielle,
> > >
> > > The issues you have inquired about have come up multiple times on
> > > the mailing list archived at:
> > > https://stat.ethz.ch/pipermail/r-sig-meta-analysis/. For example, I
> > > found this:
> > > https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2021-August/00302
> > > 8.html
> > > to be a very helpful resource.
> > >
> > > So, I tend to more generally address your questions (marked by >>>>).
> > >
> > > >>>> 1- I was also wondering how I code that in some studies they have independent cohorts performing different exercise treatment vs some studies the same cohort performs different exercise treatments. Would you have a second random effect nesting the groups within each study?
> > >
> > > As a starting point, you would want to create columns that give each
> > > cohort (cohort) and treatment (teat) in each study a distinguishable
> > > id. Then in each study, you can compute an effect size (yi)
> > > comparing a treatment versus a control for each cohort, at each time
> > > point (time).
> > >
> > > For two random single-post-test studies, one with one cohort, the
> > > other with two cohorts, your dataset might look like:
> > >
> > >    study cohort treat time           comparison         yi
> > > 1      1      1     1    0 treatment vs control  0.7394220
> > > 2      1      1     1    1 treatment vs control  0.2249452
> > > 3      1      1     2    0 treatment vs control  0.6425390
> > > 4      1      1     2    1 treatment vs control  1.2338803
> > > 5      2      1     1    0 treatment vs control -1.1074885
> > > 6      2      1     1    1 treatment vs control  0.6196865
> > > 7      2      1     2    0 treatment vs control  0.3012036
> > > 8      2      1     2    1 treatment vs control  0.1582372
> > > 9      2      2     1    0 treatment vs control  1.1909753
> > > 10     2      2     1    1 treatment vs control -0.5343208
> > > 11     2      2     2    0 treatment vs control  0.1612554
> > > 12     2      2     2    1 treatment vs control  0.9449014
> > >
> > > That's how you code such studies.
> > >
> > > *IF* you theoretically end up having a "huge" dataset such that
> > > there will be many studies with multiple cohorts, each with multiple
> > > treatments, and multiple time points, then there is a potential that
> > > the variation in effects between studies is due to the variation
> > > among cohorts within studies, and a further potential that the
> > > variation in effects among cohorts within studies is due to the
> > > variation among treatments within cohorts within studies, and yet
> > > another potential that the variation in effects among treatments
> > > within cohorts within studies is due to the variation in the timing
> > > of their measurements within treatments within cohorts within
> > > studies (or equivalently, their unique underlying differences given
> > > all combinations of study, cohort, treatment, and time) that each is
> > > defined by, *THEN*, all such sources of variation may be modeled as random effects.
> > >
> > > A nearly utopian model for that given the limits of rma.mv() might be:
> > >
> > > rma.mv(yi ~ cohort*treat*time, V = Some_V_matrix, random = list(~
> > > time
> > > | study, ~ time | interaction(study, cohort), ~ 1 |
> > > | interaction(study,
> > > cohort, treat), ~ 1 | interaction(study, cohort, treat, time)),
> > > struct = c("UN","UN") )
> > >
> > > This model can "give" you the average true effects of
> > > cohort-group-time combinations. That is, it answers the question:
> > > how each type of treatment effect in each cohort changes over time
> > > across the studies?
> > >
> > > This model can also "allow" the studies with a more complete set of
> > > post-tests to "fill-in the gap" for studies with a
> > > smaller/incomplete set of post-tests thereby improving the estimates
> > > of average true effects of cohort-group-time combinations in general
> > > (and their respective estimates of heterogeneity thereof).
> > >
> > > In the real world data, you may not have so many sources of
> > > variation (or they may be negligible). Two general simplification
> > > strategies include (1) dropping the lower ends of the hierarchy
> > > and/or (2) modifying the structure of the correlated random-effect (if any).
> > > These two strategies lead to the formation of *many* models.
> > >
> > > A few examples include:
> > >
> > > rma.mv(yi ~ cohort*treat*time, V = Some_V_matrix, random = list(~
> > > time
> > > | study, ~ time | interaction(study, cohort), ~ 1 |
> > > | interaction(study,
> > > cohort, treat)), struct = c("UN","UN"))
> > >
> > > rma.mv(yi ~ cohort*treat*time, V = Some_V_matrix, random = list(~
> > > time
> > > | study, ~ time | interaction(study, cohort), ~ 1 |
> > > | interaction(study,
> > > cohort, treat)), struct = c("UN","UN"))
> > >
> > > rma.mv(yi ~ cohort*treat*time, V = Some_V_matrix, random = list(~
> > > time
> > > | study, ~ time | interaction(study, cohort), ~ 1 |
> > > | interaction(study,
> > > cohort, treat)), struct = c("HAR","HAR")) .
> > > .
> > > .
> > > The goal is to understand the assumptions, fit all these models, and
> > > compare their fit to the data at hand to choose one (or more) among
> > > them.
> > >
> > > >>>> 2- Based on my reading I think I would code the random as Time|Study, struct = "AR".
> > >
> > > It depends on the data. Please see my previous answer.
> > >
> > > >>>> 3- This would allow observations from different studies to be independent (Study), but observations from within the same studies be dependent (Time). Is this correct?
> > >
> > > yes.
> > >
> > > >>>> 4- My last question is regarding the difference in coding the random effect as ~1|Time/Study and ~Time|Study?
> > >
> > > I think Wolfgang has discussed this elsewhere
> > > (https://www.metafor-project.org/doku.php/analyses:konstantopoulos2011).
> > > In short, ~ 1 | Study/Time is a reparametrization of ~Time|Study,
> > > struct = "CS".
> > >
> > > Best,
> > > Reza
> > >
> > >
> > > On Fri, Sep 10, 2021 at 4:27 PM Danielle Hiam
> > > <danielle.hiam using deakin.edu.au> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I am seeking some clarification on longitudinal studies and coding the random effect using rma.mv.
> > > >
> > > > For context the studies have repeated measures across time and some studies have multiple treatments (exercise in my case). Further, some of the studies have an independent cohort performing the different exercise treatments, others use the same cohort to perform different exercise treatments. I am using the fold change (FC) in expression from baseline for each timepoint and the SEM of the FC. I would like to look at the Fold Change in expression across all cohorts and timepoints and amount of heterogeneity amongst the studies. Then I will investigate with moderators in a meta-regression to investigate sources of this heterogeneity.
> > > >
> > > > I have a couple of basic questions regarding the coding
> > > >
> > > >   1.  Based on my reading I think I would code the random as Time|Study, struct = "AR". This would allow observations from different studies to be independent (Study), but observations from within the same studies be dependent (Time). Is this correct?
> > > >   2.  I was also wondering how I code that in some studies they have independent cohorts performing different exercise treatment vs some studies the same cohort performs different exercise treatments.  Would you have a second random effect nesting the groups within each study?
> > > >   3.  My last question is regarding the difference in coding the random effect as ~1|Time/Study and ~Time|Study?
> > > >
> > > > Any help or guidance would be greatly appreciated Kind regards,
> > > > Danielle
> > > >
> > > > Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
> > > >
> > > > Deakin University does not warrant that this email and any attachments are error or virus free.
> > > >
> > > >         [[alternative HTML version deleted]]
> > > >
> > > > _______________________________________________
> > > > R-sig-meta-analysis mailing list
> > > > R-sig-meta-analysis using r-project.org
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> > >
> > > _______________________________________________
> > > R-sig-meta-analysis mailing list
> > > R-sig-meta-analysis using r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>
> Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
>
> Deakin University does not warrant that this email and any attachments are error or virus free.