[R-meta] rma with already available pre-post changes; yi=yi

Sun Oct 22 12:05:42 CEST 2017

The 'vi' value is (an estimate of) the *sampling variance* of a particular statistic, such as a log odds ratio, standardized mean difference, raw or r-to-z transformed correlation coefficient, or even just a simple mean. Accordingly, we typically call sqrt(vi) the standard error of that statistic. For example, var(x)/n is an estimate of the sampling variance of a mean and sd(x)/sqrt(n) is then the standard error of a mean. For other measures, the equation for the sampling variance can look quite different; for example, for the standardized mean difference, the (asymptotic) sampling variance is 1/n1 + 1/n2 + theta^2/(2*(n1+n2)) (where theta is the true SMD value), which we can estimate with 1/n1 + 1/n2 + d^2/(2*(n1+n2)) (where d is the observed SMD value). The square root thereof is then the standard error of a standardized mean difference. And so on.

Note that when we compute a sampling variance / standard error, we are doing so not based on a bunch of repeated observations of that statistic from the sampling distribution (i.e., if we could repeat the study over and over under identical circumstances -- just with different samples -- then we would have lots of d values from the sampling distribution and then could compute the variance of all those d values, which would give us the actual sampling variance), but based on statistical theory that tells us what the variance of a standardized mean difference is. The equation for the sampling variance of a particular statistic may involve some unknown parameter (e.g., theta above), but we can then substitute an estimate thereof (i.e., the observed SMD value) to obtain an estimate of the sampling variance (and hence standard error). So, based on a single draw of the statistic from its sampling distribution, we can actually obtain an estimate of the variance of the entire sampling distribution of that statistic under the circumstances of that study (i.e., if we had repeated the study over and over under identical circumstances). I think that's pretty cool (but admittedly, I might have a pretty skewed perspective on what 'cool' is).

As a side note: For statistics that are based on a variance stabilizing transformation (e.g., Fisher's r-to-z transformed correlation coefficient), this works even better, because the sampling variance (e.g., 1/(n-3) for an r-to-z transformed correlation) doesn't involve any unknown parameters, so we don't even need the step where we substitute estimates for unknown parameters.

On the other hand, we can take the *sample variance* (or standard deviation) of a bunch of raw measurements. In fact, that's what var(x) and sd(x) are in the equation for the sampling variance and standard error of a mean (and these are actually substitutes for sigma^2 and sigma, that is, the true variance / SD of the raw measurements). But given what I wrote above, var(x) is something rather different than a sampling variance. The variance of a bunch of raw measurements is actually based on having a bunch of observations from the distribution underlying the measurements. A statistic is some kind of attribute that summarizes a bunch of such measurements (e.g., in terms of a mean) and the sampling variance thereof describes how much such a statistic would vary from sample to sample.

Finally, I actually think it would be perfectly fine to denote sd(x)/sqrt(n) as the standard deviation (not standard error) of a mean. In the end, that *is* what we are actually computing/estimating. But conventionally we use the term 'standard error' when describing the standard deviation of a statistic; so be it. In the end, it's not so important what we call these things, but what they actually mean (pun intended) and where they come from.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of P. Roberto Bakker
Sent: Sunday, 22 October, 2017 7:54
To: Michael Dewey
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] rma with already available pre-post changes; yi=yi

Dear Michael,

Thank you for your explanations.

Yes I use vi=se^2; my apologies for not being clear.
Now we talk about this, I learned in statistics that vi=sd^2 - but I read
in every literature about meta-analysis that vi=se^2. How can I see this?

About posting in plain text; you mean text like in notepad? So, I copy/past
text from notepad into mail?

Bw,
Roberto

2017-10-21 14:21 GMT+02:00 Michael Dewey <lists at dewey.myzen.co.uk>:

> Dear Roberto
>
> On 21/10/2017 05:28, P. Roberto Bakker wrote:
>
>> Hi,
>>
>> I want to meta-analyze pre-post change measures between treatment vs
>> placebo.
>> I got already delivered the Hedges g between the two arms and  their SE
>> Hedges g.
>> So I used res <- rma(measure="SMCC", yi=yi, vi=vi, data=datsub, digits=2,
>> method = "REML")
>>
> That is fine if your vi is indeed the variance but you said you have the
> standard error in which case you would need vi = se ^ 2 or se = se
>
> My questions:
>> Do I need to use SMCC  in rma()? I suppose 'measure=' is necessary in
>> escalc()
>>
> Yes, that is correct since you have got the general case here where you
> specify yi and vi (or sei) and rma neither knows not cares where on earth
> they came from.
>
> I see no difference in SMCC/SMCR etc. So I suppose it is not necessary. I
>> only use it for the forest() scale title.
>>
>> Do I use yi=yi or just yi? As ask this because I was adviced in a former
>> mail to use sei=sei instead of sei, the same voor vi.
>>
> It is always safest to use the explicit parameter names in case you forget
> their order.
>
>> Bw
>> Roberto
>>
> It would be good to post in plain text in case future posts get mangled,
> being in HTML. This one got through OK.