# [R-meta] Do we assume multi-stage sampling of effect sizes in multi-level models?

Viechtbauer, Wolfgang (SP) wo||g@ng@v|echtb@uer @end|ng |rom m@@@tr|chtun|ver@|ty@n|
Wed Jul 21 21:50:16 CEST 2021

```Let me chime in here. I am going to focus on the 'multilevel sampling' that underlies the standard random-effects model (which can actually be regarded as a two-level model) - not some further complications that arise if there are multiple outcomes / effects for the same study.

The 'sampling' that happens at the level of the observed effects is the same as what we assume in essentially all of statistics. All statistics have a sampling distribution, whether they are a simple mean, the test statistic of a t-test, or a standardized mean difference. The sampling distributions of these statistics can be assumed to arise through random sampling, but hardly ever has such random sampling taken place in practice (leaving aside people doing surveys where there is a lot of emphasis on doing proper sampling). We just assume that the people who participated in our study (or whatever the unit of analysis is) are like a random sample from some population of people and that if we would run the same study all over again under identical circumstances, the same random processes would lead to a new sample that again is like a random sample from that same population. That doesn't imply that it's a random sample from a population we care about, but it can be assumed to be a random sample from *some* population. Again, all of this applies generally, not just to meta-analysis (where we just happen to have sampling distributions for the various effect size or outcome measures).

Sidenote: There is another way one can motivate the concept of a sampling distribution which does not involve random sampling, but relies on the idea of randomization. For some discussion of this, see here:

https://stats.stackexchange.com/questions/13607/can-non-random-samples-be-analyzed-using-standard-statistical-tests/13616#13616

What gets a bit more philosophical is the concept of a 'population of studies' and that the studies we have are a random sample from that (usually purely hypothetical) population. I used to joke that believing in that hypothetical population is like believing in UFOs -- which coincidentally also look a bit like a normal distribution:

https://www.closeup.de/media/oart_0/oart_i/oart_60621/thumbs/816944_2109747.jpg

(yes, I used to be a big X-Files fan ...). But one could motivate this idea on the following grounds: Imagine that the way a study is run depends on many factors, each one of them one could just as well decide one way or another. Now imagine there is a large number of such factors - whose 'sum' essentially leads to the specific way a study is run, which in turn determines what the true outcome/effect is for such a study. Then based on the central limit theorem (not in terms of the number of studies, but in terms of the number of these small factors that get added up), the true effects would actually have a normal distribution and the true effects of the studies we have are a 'sample' from that distribution. So the sampling here is not so much that we, as meta-analysts, are really sampling studies, but that the studies themselves have 'sampled' the values of these various factors.

This thinking may in fact be related to another way that some have motivated the idea of a random-effects model, which involves the concept of exchangability:

https://en.wikipedia.org/wiki/Exchangeable_random_variables

so we don't have to assume random sampling, 'just' that the true effects are exchangable. I can't find the reference(s) right now where this is discussed and I can't really say whether this really helps.

Note that similar discussions arise in other contexts, for example whether it makes sense to use inferential statistics when one has actually sampled an entire population. Does it make sense to run a t-test then? Use of inferential statistics in such a context is often motivated on grounds that the specific population we have is just one of many that could have arisen. Now it seems like we are back to UFOs again ... I'll stop here. Also, discussions about this stuff get a lot of more fun when you've had a couple beers.

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>Sent: Wednesday, 21 July, 2021 20:41
>To: James Pustejovsky
>Cc: R meta
>Subject: Re: [R-meta] Do we assume multi-stage sampling of effect sizes in multi-
>level models?
>
>I have (and had) no issue with the first stage of sampling (simple random
>sampling of studies). I also have no problem assuming (only
>epistemologically) that there could be a "literature [that] has measured a
>very large number of outcomes" (whatever 'outcome' means which may
>introduce another stage of [random] sampling).
>
>My problem is with the second stage of the sampling. Particularly, to think
>that knowledge at this stage can be formed on a simple random
>sampling basis of some universe and not a non-probability (and perhaps
>thoughtfully biased like purposive sampling) one is a bit counter-intuitive.
>
>In the end, compared to primary data, simple random sampling of effect
>sizes (at any stage of sampling) seems a bit restricted.
>
>Once again many thanks for sharing your expertise (a question about your
>expanding range paper to follow)
>Fred
>
>On Wed, Jul 21, 2021 at 10:53 AM James Pustejovsky <jepusto using gmail.com>
>wrote:
>
>> I'm not sure i agree about the theoretical impossibility of MLMA.
>>
>> Consider that the regular old random effects model also posits that we are
>> sampling studies from some population. Usually that population is
>> hypothetical (the set of possible studies that could conceivably be
>> conducted on the topic). But sometimes we may identify a very large body of
>> literature and then literally draw a random sample of records for purposes
>> of coding (because coding is expensive and we have limited resources).
>>
>> One could imagine doing the same in a multi-stage setting, where every
>> study in the literature has measured a very large number of outcomes. We
>> first sample studies, then (again due to resource constraints), sample only
>> a few of the outcomes from each study for purposes of effect size
>> calculation. This is less plausible as a physical process, admittedly. But
>> we could imagine that the primary investigators are engaging in something
>> akin to this when they design their primary studies. Ideally, they would
>> measure many different outcomes using many different
>> instruments/scales/whatnot. But due to resource constraints, they can
>> actually only collect a few measurements. Perhaps they choose
>> instruments/scales more or less at random?
>>
>> On Wed, Jul 21, 2021 at 10:43 AM Farzad Keyhan <f.keyhaniha using gmail.com>
>> wrote:
>>
>>> Thanks, James. You are right about what I meant by epistemological vs.
>>> ontological. But the problem is that in the case of "primary data", it is
>>> "theoretically possible" to follow a multi-stage plan but in many cases we
>>> may not "afford" to do it, and so it doesn't happen (always
>>> epistemologically plausible, but at times ontologically implausible).
>>>
>>> But in the case of multilevel meta-regression, it's "not even
>>> theoretically possible" to assume so. Of course, I fully understand that
>>> there is no remedy and it is what it is. But I just wanted to make sure I'm
>>> not way off on this as a non-stats person.
>>>
>>> Thank you again, for your expertise and dedication,
>>> Fred
>>>
>>> ps. My colleagues and I have run into a question reading your
>>> expanding range paper (and applying it to our ongoing meta project) but
>>> I'll, if you don't mind, ask that on this forum later.
>>>
>>> On Wed, Jul 21, 2021 at 10:07 AM James Pustejovsky <jepusto using gmail.com>
>>> wrote:
>>>
>>>> Hi Fred,
>>>>
>>>> This is an interesting question, for sure, and I would love to hear how
>>>>
>>>> My own perspective: I agree with your interpretation in that the
>>>> assumptions of the multi-level meta-analysis (MLMA) model posit a two-stage
>>>> sampling process, where we first sample studies from some population of
>>>> possible studies and then sample effect sizes from the population of effect
>>>> sizes that *could have been measured* within each of those studies. The
>>>> overall average effect size parameter in the MLMA is then the average of
>>>> study-specific average effect size parameters, which in turn are averages
>>>> over a (hypothetical) set of effects that could have been assessed.
>>>>
>>>> An implication of this assumption is that the MLMA model attributes
>>>> additional uncertainty to studies that measure only a single outcome. This
>>>> happens because it treats those studies as having measured just one of many
>>>> possible outcomes, rather than (for instance) as having measured the single
>>>> gold-standard outcome given the constructs/question under investigation. I
>>>> do worry about whether this assumption is reasonable, but at the moment I
>>>> don't have any great ideas about how to probe it.
>>>>
>>>> Of course, just as with multi-level modeling of primary data, the
>>>> assumptions of the model don't---and needn't---necessarily match up with
>>>> the actual physical process used to collect the data. (I think this is what
>>>> you were getting at in differentiating between the epistomology and the
>>>> ontology?) Multi-level models are very commonly used with data collected
>>>> through means other than multi-stage random sampling, and I've never heard
>>>> of a meta-analytic dataset being assembled through a multi-stage sampling
>>>> of effect size information. Whether using MLMA is a reasonable statistical
>>>> strategy depends on a) whether the model's assumptions are a reasonable,
>>>> stylized approximation of the process you're investigating and b) the
>>>> robustness of the approach to violations of its assumptions.
>>>>
>>>> James
>>>>
>>>> On Tue, Jul 20, 2021 at 11:23 AM Farzad Keyhan <f.keyhaniha using gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> Applying multi-level models to "raw data'' assumes that the data have
>>>>> been
>>>>> collected via a multi-stage sampling plan (e.g.,first randomly selecting
>>>>> schools, then randomly selecting students from within those selected
>>>>> schools) which makes the student data from within each school not be iid
>>>>> distributed (hierarchical dependence).
>>>>>
>>>>> But in meta-analysis, do we need to assume that a multi-stage sampling
>>>>> of
>>>>> "effect sizes" (first randomly selecting some studies, then selecting
>>>>> some
>>>>> effect sizes from within those studies) has occurred to justify the use
>>>>> of
>>>>> multilevel meta-regression models?
>>>>>
>>>>> I would say, epistomologically yes (but ontologically no), but I wonder
>>>>> what meta-analysis experts think?
>>>>>
>>>>> Thank you,
>>>>> Fred

```