# [R-meta] Question on effect sizes

David Pedrosa d@v|d@pedro@@ @end|ng |rom @t@||@un|-m@rburg@de
Sat Jan 22 22:50:36 CET 2022

```Hi Lukas and list,

first of all, thanks for your time and the suggestions and apologies for
not making my point as clear as I should have. I have already contacted
all authors and the pharmaceutical companies, but the latter are kind of
reluctant to disclose results for all kinds of reasons and the former
have sometime no access anymore since some results are 20 or more years
old. But let me delineate my problem:

We have a measure of disease severity, which consists of several items
and is summarised. In most of the studies I have a sum score
group_mean(x) - so x1+x2 - with group_sd(x). But it may happen that
authors provide group_mean(x1) and group_mean(x2)  with their respective
sd. It's of course easy to get the group_mean(x) but I'm wondering what
the approach would be for sd(x). I though about the "pooled_sd" with

pooled_sd <- sqrt(((n1-1)*sd_x1^2 + (n2-1)*sd_x2^2) / (n1+n2-2)))

but I'm not sure whether that makes sense. So I tried to simulate data
to get a hunch of how reliable results are (code below), but the mean
difference between "true" sd and estimated sd is in a few cases
considerable. So I was wondering if I am missing something/if this is a
valid approach.

I would be delighted if you or someone else could guide me with some

All the best,

David

Code:

## Test for simulation of compund SD

# General
set.seed(1234)
rnorm2       <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
nsim         <- 500
group_size   <- c(100, 100)

# Simulate two known datasets
means_x1     <- runif(nsim, 0, 5) # values are be between 0 and 5
sd_x1        <- runif(nsim, 0, 4)

means_x2     <- runif(nsim, 0, 5)
sd_x2        <- runif(nsim, 0, 4)

x1           <- matrix(data=NA,nrow=group_size,ncol=500)
x2           <- matrix(data=NA,nrow=group_size,ncol=500)
for (i in 1:500){
x1[,i] <- rnorm2(group_size, means_x1[i], sd_x1[i])
x2[,i] <- rnorm2(group_size, means_x2[i], sd_x2[i])
}

plot(apply(rbind(x1,x2), 2, mean) - apply(rbind(means_x1, means_x2), 2,
mean)) for estimation differences
sd_sum       <- apply(rbind(x1,x2), 2, sd) # "ground truth"

sd_estimate <- rep(NA, nsim) # according to rnorm2
for (i in 1:500){
sd_estimate[i] <- sqrt(((group_size-1)*sd_x1[i]^2 +
(group_size-1)*sd_x2[i]^2) / (group_size+group_size-2))
}
results <- data.frame(x=sd_sum, y=sd_estimate, z=sd_sum-sd_estimate)
plot(results\$z)

Am 21.01.22 um 17:25 schrieb Lukasz Stasielowicz:
> Hi,
>
> a couple of ideas that may be obvious to you but the provided
> description is rather short, so I don't know whether you have thought
>
> 1. Did you try to contact the authors of the studies? Maybe they will
> be willing to provide the missing statistics or the data set. The
> willingness varies obviously between researchers (and research areas)
> but it is often worth the effort.
>
> One could contact the corresponding author and ask for the statistics
> or the data set (providing the choice can increase the success rate).
> If you don't receive an answer within several days (e.g. one week)
> thwn one can try to contact the other authors. Recently I used this
> strategy for two different meta-analyses and approximately 80% - 90%
> of the research teams wrote back. Obviously, not all of them could
> provide answers or data (hard drive failure etc.) but approximately
> 30% - 50% of the authors provided additional information.
>
> 2. If you have already explored the first strategy and the relevant
> information is still missing, then one could try to reconstruct it. It
> is something that you were referring to but the description is rather
> short, so I cannot infer what is meant by pooled SD etc.
> One could try to rearrange the formulas to compute the missing
> information manually but if there are two unknowns (e.g. SD and M for
> one group is missing) then it is not possible.
> Nevertheless, one could try to make some guesstimates (e.g. are the
> SDs for both groups in other studies similar? if yes than one could
> make a respective guesstimate for the missing information) in order to
> impute the data.
> One could even make several guesstimates and test these different
> scenarios to test the robustness of the findings. Another sensitivity
> analysis would be to compare meta-analytic results based on studies
> with without missing information and the scenarios with guesstimates.
>
> 3. It is probably obvious to you but dropping the studies with missing
> information is also a possibility. However, it could bias the results
> (if the dropped studies differ significantly from the included studies).
>
>
> Hope it helps!
>
> Best wishes,
--

<http://www.ukgm.de>

PD Dr. David Pedrosa
Leitender Oberarzt der Klinik für Neurologie,
Leiter der Sektion Bewegungsstörungen, Universitätsklinikum Gießen und
Marburg

Tel.: (+49) 6421-58 65299 Fax: (+49) 6421-58 67055