[R-meta] Loss to followup over time and calculation of raw mean change (metafor::escalc)

Sun Mar 10 14:05:15 CET 2019

Hi Dale,

How should one compute the mean change if we start with n1 people at timepoint 1 and only have data from n2 people (with n2 < n1) at timepoint 2?

For example:

x1 <- rnorm(10, 5, 1)
x2 <- x1 + rnorm(10, 2, 1)
x2[9:10] <- NA
cbind(x1,x2)

The mean change of the 8 people for whom we have complete data is:

mean(x2-x1, na.rm=TRUE)

which is the same thing as

mean((x2-x1)[1:8])

and can also be thought of as the mean difference for the complete cases.

mean(x2[1:8]) - mean(x1[1:8])

The mean of the 8 people for whom we have data at timepoint 2 minus the mean of 10 people for whom we data at timepoint 1 would be:

mean(x2, na.rm=TRUE) - mean(x1)

which is the same thing as

mean(x2[1:8]) - mean(x1)

These two approaches to computing a mean change are not the same thing (they are if there is no loss to follow-up). One can debate which of these is 'correct' -- and maybe neither is if the missingness is not random, but let's leave that aside for now.

Essentially, escalc() assumes the first approach (for which the computation of the sampling variance is trivial). The values of m1i, m2i, sd1i, sd2i, and ri and then assumed to be based only on the ni individuals with complete data.

I cannot say for sure what the data represent that you posted, but I assume that the means (and SDs) are based on the available data (so, for example, in the first row, the mean of 37.69 is based on n1=102 and the mean of 34.53 is based on n2=100). So, in that case, what is being computed is mean(x2, na.rm=TRUE) - mean(x1). The correlation (between x1 and x2) is then presumably only based on the complete cases.

Computing the sampling variance of such a mean change cannot be done with the usual equations. We can decompose

mean(x2[1:8]) - mean(x1)

into

mean(x2[1:8]) - mean(x1[1:8])*8/10 - mean(x1[9:10])*2/10

In order to compute the sampling variance for this, you would need mean(x1[1:8]) and mean(x1[9:10]) (and not just mean(x1)) and sd(x1[1:8]) and sd(x1[9:10]) (and not just sd(x1)) -- so, the mean and SD of x1 for the complete and incomplete cases separately.

Basically, you cannot compute the mean change correctly using the first approach (since m1i and sd1i are based on the available data and not just the complete cases) and you cannot compute the sampling variance correctly using the second approach (since you do not have the mean and SD of x1 separately for the complete and incomplete cases).

In practice, this issue is often completely ignored. Instead, the means and SDs are used as given, we then pretend that m1i and sd1i are based only on the complete cases, and use the *follow-up* sample size for 'ni'. So, in your example:

dat1 <- escalc(measure = "MC", m1i = mean.3, m2i = mean.0,
                               sd1i = sd.3, sd2i = sd.0,
                               ni = n.3, ri = r, data =
                               wide_use_c)

Again, this isn't quite correct, but using the follow-up sample size should be a bit conservative.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Dale Steele
Sent: Saturday, 02 March, 2019 23:43
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Loss to followup over time and calculation of raw mean change (metafor::escalc)

I have data from studies which evaluated patients in each treatment arm at
two time points and report a quantitative outcome.

Between the two time points, a variable number of subjects are lost to
followup.  I would like to calculate a raw mean change and/or standardized
mean change.  I found that metafor::escalc allows only one sample size to
be entered.

Any advice on how to be address the issue of different sample sizes,
specifically when using metafor::escalc?

Example data below:

dat1 <- escalc(measure = "MC", m1i = mean.3, m2i = mean.0,
                                  sd1i = sd.3, sd2i = sd.0,
                                  n1i = n.3, n2i = n0, ri = r, data =
                                  wide_use_c)

wide_use_c <- structure(list(id = c("172507", "172507", "172545", "172545",
"172619"), arm = c("CBT_Educ", "CBT_MI", "MI", "TAU", "Educ"),
    n.0 = c(102, 103, 68, 71, 61), mean.0 = c(37.69, 40.23, 19,
    15.3, 4.3), sd.0 = c(16.06, 14.23, 10.9, 10.1, 2.2), n.3 = c(100,
    101, 41, 54, 53), mean.3 = c(34.53, 31.8, 14.2, 13.7, 4.1
    ), sd.3 = c(19.78, 19.67, 10.8, 11.1, 2.5), r = c(0.9, 0.9,
    0.9, 0.9, 0.9)), class = c("tbl_df", "tbl", "data.frame"), row.names =
c(NA,
-5L), na.action = structure(c(`1` = 1L, `2` = 2L, `3` = 3L, `4` = 4L,
`7` = 7L, `9` = 9L, `11` = 11L, `12` = 12L, `13` = 13L, `14` = 14L,
`19` = 19L, `20` = 20L, `23` = 23L, `24` = 24L, `25` = 25L, `26` = 26L,
`27` = 27L, `28` = 28L, `29` = 29L, `30` = 30L, `35` = 35L, `36` = 36L,
`39` = 39L, `40` = 40L, `41` = 41L, `42` = 42L, `43` = 43L, `47` = 47L,
`48` = 48L, `52` = 52L, `53` = 53L, `54` = 54L, `55` = 55L, `56` = 56L,
`57` = 57L, `60` = 60L, `61` = 61L, `62` = 62L, `63` = 63L, `64` = 64L,
`65` = 65L, `66` = 66L, `69` = 69L, `70` = 70L, `73` = 73L, `74` = 74L
), class = "omit"))

Best.

--Dale