[R-meta] Collapsing a between subject factor

Tue Jan 30 11:54:46 CET 2018

This still isn't quite right. You can compute the mean and SD for the combined sample exactly:

### simulate some data
n.total <- 100
grp <- sample(1:2, size=n.total, replace=TRUE, prob=c(.2,.8))
y   <- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

This also generalizes to any number of groups. Try with:

grp <- sample(1:3, size=n.total, replace=TRUE, prob=c(.2,.6,.3))

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
>project.org] On Behalf Of Oliver Clark
>Sent: Monday, 29 January, 2018 12:14
>To: Michael Dewey
>Cc: r-sig-meta-analysis at r-project.org; Oliver Clark
>Subject: Re: [R-meta] Collapsing a between subject factor
>
>Dear Michael,
>
>Many thanks for your response.  Indeed, the sample sizes are unequal
>which is apparently why it was treated as two analyses.
>
>I’ve been playing with this example others and your example below
>overestimates the variance.  I think this is because the means are being
>squared rather than the delta from the combined:
>
>S_c <- (n_1 * (v_1 + (m_1 - m_c) ^2 ) + n_2*( v_2 + ( m_2 - m_c) ^2) ) /
>( n_1 + n_2)
>
>This still overestimates the known population variance of 4, so applying
>the bessel correction:
>
>S_c_2 <- ( (n_1 - 1 )*( v_1 +( m_1 - m_c ) ^2 ) + ( n_2 - 1)*( v_2 + (
>m_2 - m_c)^2) ) / ( ( n_1 + n_2) -1 )
>
>leads to a good estimate of the combined variance.  Code:
>
>> M <- rnorm(34,5,3)
>> F <- rnorm(57,5,3)
>>
>> comb <- c(M,F)
>> n_1 = 34
>> n_2 = 57
>> m_1 = mean(M)
>> m_2 = mean(F)
>>
>> v_1 = sd(M)^2
>> v_2 = sd(F)^2
>>
>> m_c = (n_1 * m_1 + n_2 * m_2) / (n_1 + n_2)
>>
>> S_c_2 <- ( (n_1 - 1 )*( v_1 +( m_1 - m_c ) ^2 ) + ( n_2 - 1)*( v_2 + (
>m_2 - m_c)^2) ) / ( ( n_1 + n_2) -1 )
>>
>> sd(comb) - sqrt(S_c_2)
>[1] 0.001710072
>
>Many thanks for your advice - I’d have been stuck without your input!
>
>Best wishes,
>
>Oliver
>
>> On 29 Jan 2018, at 10:02, Michael Dewey <lists at dewey.myzen.co.uk>
>wrote:
>>
>> Dear Oliver
>>
>> You do not say whether the sample sizes are equal or not so I give the
>procedure for unequal.
>>
>> For the means you need to weight by sample size
>>
>> (n_1 * m_1 + n_2 * m_2) / (n_1 + n_2)
>>
>> where n are sample sizes and m means
>>
>> For variance you need
>>
>> (n_1 * (m_1^2 + v_1) + n_2 * (m_2^2 + v_2) / (n_1 + n_2)) - m_c
>>
>> where v are variances and m_c is the combined mean you got above.
>>
>> I suggest double checking this with a few examples in case of
>transcription errors at my end or yours.
>>
>> Michael
>>
>> On 28/01/2018 21:49, Oliver Clark wrote:
>>> Hi all,
>>> I am currently coding studies for a meta-analysis and have come across
>a case in which I have a set of studies in which all but one do not
>include sex as a between subject factor.  The reason given was unequal
>cell sizes, differences in visual stimuli (it is not clear what these
>differences are so they are unlikely to be systematic, rather an
>artefact)  and strength differences between men and women.
>>> With my limited experience, I don’t see the benefit in treating these
>both as separate cases and was wondering whether it would make sense to
>merge the means and SDs for both groups and use that with the total N to
>calculate an effect size?
>>> Combining the means seems relatively straightforward but I am not sure
>how to do the standard deviations.  I have tried averaging the variance
>in the following simulation to get there but must admit that I am
>stabbing in the dark!:
>>>> M <- rnorm(10,5,2)
>>>> F <- rnorm(10,5,2)
>>>>
>>>> comb <- c(M,F)
>>>>
>>>> (mean(M) + mean(F)) / 2 == mean(comb)
>>> [1] TRUE
>>>>
>>>> sqrt((sd(M)^2 + sd(F)^2)/2) == sd(comb)
>>> [1] FALSE
>>> Can anyone offer any advice on the best path for this? Should I treat
>them as different studies, attempt to merge the means and SDs, use a
>different aggregation method or omit this study?
>>> Many thanks,
>>> Oliver Clark
>>> PhD Student
>>> Manchester Metropolitan University