[R-meta] Question on effect sizes

Thu Jan 27 19:59:50 CET 2022

Dear David,

I haven't looked at your post in detail, but I think you might be after this:

# Suppose we have the mean, SD, and size of several subgroups, but we
# need the mean and SD of the total/combined groups. Code below shows
# what we need to compute to obtain this.

# simulate some data
n.total <- 100
grp <- sample(1:4, size=n.total, replace=TRUE)
y   <- rnorm(n.total, mean=grp, sd=2)

# means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

# want to get mean and SD of the total group
mean(y)
sd(y)

# mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

# SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

# check that we get the right values
m.total
sd.total

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>Behalf Of David Pedrosa
>Sent: Saturday, 22 January, 2022 22:51
>To: r-sig-meta-analysis using r-project.org
>Cc: Lukasz Stasielowicz
>Subject: Re: [R-meta] Question on effect sizes
>
>Hi Lukas and list,
>
>first of all, thanks for your time and the suggestions and apologies for
>not making my point as clear as I should have. I have already contacted
>all authors and the pharmaceutical companies, but the latter are kind of
>reluctant to disclose results for all kinds of reasons and the former
>have sometime no access anymore since some results are 20 or more years
>old. But let me delineate my problem:
>
>We have a measure of disease severity, which consists of several items
>and is summarised. In most of the studies I have a sum score
>group_mean(x) - so x1+x2 - with group_sd(x). But it may happen that
>authors provide group_mean(x1) and group_mean(x2)  with their respective
>sd. It's of course easy to get the group_mean(x) but I'm wondering what
>the approach would be for sd(x). I though about the "pooled_sd" with
>
>pooled_sd <- sqrt(((n1-1)*sd_x1^2 + (n2-1)*sd_x2^2) / (n1+n2-2)))
>
>but I'm not sure whether that makes sense. So I tried to simulate data
>to get a hunch of how reliable results are (code below), but the mean
>difference between "true" sd and estimated sd is in a few cases
>considerable. So I was wondering if I am missing something/if this is a
>valid approach.
>
>I would be delighted if you or someone else could guide me with some
>advice.
>
>All the best,
>
>David
>
>Code:
>
>## Test for simulation of compund SD
>
># General
>set.seed(1234)
>rnorm2       <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
>nsim         <- 500
>group_size   <- c(100, 100)
>
># Simulate two known datasets
>means_x1     <- runif(nsim, 0, 5) # values are be between 0 and 5
>sd_x1        <- runif(nsim, 0, 4)
>
>means_x2     <- runif(nsim, 0, 5)
>sd_x2        <- runif(nsim, 0, 4)
>
>x1           <- matrix(data=NA,nrow=group_size[1],ncol=500)
>x2           <- matrix(data=NA,nrow=group_size[2],ncol=500)
>for (i in 1:500){
>     x1[,i] <- rnorm2(group_size[1], means_x1[i], sd_x1[i])
>     x2[,i] <- rnorm2(group_size[2], means_x2[i], sd_x2[i])
>}
>
>mean_sum     <- apply(rbind(x1,x2), 2, mean) #trivial see also
>plot(apply(rbind(x1,x2), 2, mean) - apply(rbind(means_x1, means_x2), 2,
>mean)) for estimation differences
>sd_sum       <- apply(rbind(x1,x2), 2, sd) # "ground truth"
>
>sd_estimate <- rep(NA, nsim) # according to rnorm2
>for (i in 1:500){
>     sd_estimate[i] <- sqrt(((group_size[1]-1)*sd_x1[i]^2 +
>(group_size[2]-1)*sd_x2[i]^2) / (group_size[1]+group_size[2]-2))
>}
>results <- data.frame(x=sd_sum, y=sd_estimate, z=sd_sum-sd_estimate)
>plot(results$z)
>
>Am 21.01.22 um 17:25 schrieb Lukasz Stasielowicz:
>> Hi,
>>
>> a couple of ideas that may be obvious to you but the provided
>> description is rather short, so I don't know whether you have thought
>> about the following points:
>>
>> 1. Did you try to contact the authors of the studies? Maybe they will
>> be willing to provide the missing statistics or the data set. The
>> willingness varies obviously between researchers (and research areas)
>> but it is often worth the effort.
>>
>> One could contact the corresponding author and ask for the statistics
>> or the data set (providing the choice can increase the success rate).
>> If you don't receive an answer within several days (e.g. one week)
>> thwn one can try to contact the other authors. Recently I used this
>> strategy for two different meta-analyses and approximately 80% - 90%
>> of the research teams wrote back. Obviously, not all of them could
>> provide answers or data (hard drive failure etc.) but approximately
>> 30% - 50% of the authors provided additional information.
>>
>> 2. If you have already explored the first strategy and the relevant
>> information is still missing, then one could try to reconstruct it. It
>> is something that you were referring to but the description is rather
>> short, so I cannot infer what is meant by pooled SD etc.
>> One could try to rearrange the formulas to compute the missing
>> information manually but if there are two unknowns (e.g. SD and M for
>> one group is missing) then it is not possible.
>> Nevertheless, one could try to make some guesstimates (e.g. are the
>> SDs for both groups in other studies similar? if yes than one could
>> make a respective guesstimate for the missing information) in order to
>> impute the data.
>> One could even make several guesstimates and test these different
>> scenarios to test the robustness of the findings. Another sensitivity
>> analysis would be to compare meta-analytic results based on studies
>> with without missing information and the scenarios with guesstimates.
>>
>> 3. It is probably obvious to you but dropping the studies with missing
>> information is also a possibility. However, it could bias the results
>> (if the dropped studies differ significantly from the included studies).
>>
>>
>> Hope it helps!
>>
>> Best wishes,
>--
>
><http://www.ukgm.de>
>
>
>PD Dr. David Pedrosa
>Leitender Oberarzt der Klinik für Neurologie,
>Leiter der Sektion Bewegungsstörungen, Universitätsklinikum Gießen und
>Marburg
>
>Tel.: (+49) 6421-58 65299 Fax: (+49) 6421-58 67055
>
>Adresse: Baldingerstr., 35043 Marburg
>Web: https://www.ukgm.de/ugm_2/deu/umr_neu/index.html
>
>	[[alternative HTML version deleted]]
>
>_______________________________________________
>R-sig-meta-analysis mailing list
>R-sig-meta-analysis using r-project.org
>https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis