[R-meta] Background on large meta analysis with RCT and single-arm studies

Viechtbauer, Wolfgang (SP) wo||g@ng@v|echtb@uer @end|ng |rom m@@@tr|chtun|ver@|ty@n|
Tue Mar 8 22:17:49 CET 2022

Hi David,

Let's distinguish three types of designs using the notation of Campbell and Stanley (1963):

1) Posttest-Only Control Group Design

Trt   R  X  O
Ctrl  R     O

(R = randomization, X = treatment, O = observation)

For this design, we can compute the usual standardized mean difference of the form 

d = (m_post_trt - m_post_control) / sd_post,

also known as Cohen's d or Hedges' g (when the bias-correction is applied). This is measure "SMD" in metafor.

2) Pretest-Posttest Control Group Design

Trt   O  R  X  O
Ctrl  O  R     O

For this design, we can compute the standardized mean change within each group and the difference thereof as our effect size measure, so:

d = (m_post_trt - m_pre_trt) / sd_pre_trt - (m_post_ctrl - m_pre_ctrl) / sd_pre_ctrl.

Importantly, within each group, we standardize based on either the pre- or the post-test SD, but NOT the SD of the change scores. This can be accomplished in metafor by using measure "SMCR" (for the 'standardized mean change with raw score standardization'), once for the treatment and once for the control group and then taking the difference of the two values (and we sum up their sampling variances). This is explained in detail here:


For randomized studies, the d-values obtained from designs 1 and 2 are directly comparable. Any pre-treatment differences must be, by definition, random, and hence could in principle even be ignored. So, we could also treat this as design 1, computing the standardized mean difference only using the post-test information. This might be an option when the pre-post correlation is not known, since this correlation is needed to compute the sampling variance of measure "SMCR".

It is NOT appropriate to use measure "SMCC" (i.e., the 'standardized mean change with change score standardization') within each group, since the d-value computed for design 1 uses raw score standardization and so only using "SMCR" will give a d-value for design 2 that is comparable to that of design 1.

3) One-Group Pretest-Posttest Design


So here we have observed a single group, once before and once after a treatment. Campbell and Stanley (1963) discuss in detail the various sources of invalidity that are not controlled in such a design and hence could lead to incorrect conclusions one might draw about the 'effect' of treatment X. An obvious one is that we have no idea whether the change from the pre- to the post-treatment could also have happened in the absence of X (for other reasons, such as 'maturation').

Leaving this aside for now, for this design, we can compute

d = (m_post - m_pre) / sd_pre,

that is, measure "SMCR". We can think of the pre-treatment observation as the 'control' observation and the post-treatment observation as the 'treatment' observation. In that sense, this d-value is comparable to that from designs 1 and 2. Again, using raw score standardization is crucial.

As noted above, there are all kinds of issues with design 3 that make it a much weaker design than 1 and 2 (again, see Campbell and Stanley, 1963). To what extent these issues affect the d-values in any particular case is difficult to say. However, given enough d-values from design 3 and the other designs, we can also approach this issue empirically. That is, we code as a moderator the design type and then examine in a meta-regression analysis to what extent there are systematic differences between d-values obtained from the various designs.

One has to be cautious when doing this exercise, since the results from such a moderator analysis are 'observational' in themselves. So there could be all kinds of other differences between studies using different designs, unrelated to the sources of invalidity discussed by Campbell and Stanley, that could lead to systematic differences in the d-values between different design types. But at least it is a somewhat more principled approach to addressing the question to what extent d-values from this design can be combined with those from the other designs.

I hope this addresses your question. I wrote this up in some detail, since this is definitely a FAQ and hope to refer people to this post in the future whenever this question comes up again.


>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>Behalf Of David Pedrosa
>Sent: Tuesday, 08 March, 2022 19:20
>To: r-sig-meta-analysis using r-project.org
>Subject: [R-meta] Background on large meta analysis with RCT and single-arm
>Dear list,
>on our group we have performed an extensive search on treatment options
>for Parkinson's disease and we have encountered a large number of
>different trials and study types. We have managed to get reasonable
>comparisons for all RCTs providing mean-differences or before-after
>designs and we have finally used the SMD as our metric. What is left is
>the relatively large number of pre-post studies with single arm
>interventions and the non-randomised controlled trials. While the latter
>are comparatevely easy to understand and to model, we are really not
>sure about if and how to include single-arm studies. We have tried to
>look though the usual book chapters and scientific papers and we have
>also looked through the metafor documentation, but we were not very
>successful in understanding what the pitfalls would be but especially
>how an implementation could look like. If there is anyone who may guide
>us a bit or provide some useful links, that would be helpful.
>Best wishes,

More information about the R-sig-meta-analysis mailing list