[R-meta] Confusion about how to use UN structure

Wed Sep 1 09:04:20 CEST 2021

To add to this:

The correlation coefficient(s) can also drift towards 1 when particular variance components are very small. For example, taking a simpler structure, we have the standard multilevel model:

dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
res

Now imagine the sampling variances had actually been 4 times larger:

res <- rma.mv(yi, vi*4, random = ~ 1 | district/school, data=dat)
res

We find that the estimate of sigma^2.2 is now equal to 0, so no within-district heterogeneity. And as discussed on numerous occasions on this mailing list, the model above can also be parameterized as:

res <- rma.mv(yi, vi*4, random = ~ factor(school) | district, data=dat)
res

And we find that the estimate of rho is 1. If there is no within-district heterogeneity, then the true effects within districts are all the same, that is, they correlate perfectly.

Now let's take an example with an UN structure:

dat <- dat.berkey1998
V <- lapply(split(dat[c("v1i", "v2i")], dat$trial), as.matrix)
V <- bldiag(V)
res1 <- rma.mv(yi, V, mods = ~ outcome - 1, random = ~ outcome | trial, struct="UN", data=dat)
res1

Now suppose the variances (and covariances) had again been larger:

res2 <- rma.mv(yi, V*4, mods = ~ outcome - 1, random = ~ outcome | trial, struct="UN", data=dat)
res2

Here, the estimate of tau^2.2 is starting to drift towards 0 and the estimate of rho is now again 1. In essence, if the true effects for the second outcome have little to no heterogeneity, then it becomes very difficult to estimate the correlation. Let's plot the two likelihood profiles for rho:

profile(res1, rho=1, xlim=c(0,1))
sav <- profile(res2, rho=1, xlim=c(0,1), plot=FALSE)
lines(sav$rho, sav$ll, type="o")

Note that the one for res2 is quite flat, so estimating rho is very difficult (that's also true in res1 with just 5 studies, but not as badly as in res2).

In an extreme scenario, if tau^2.2 was estimated to be zero, then in essence the estimate of rho becomes arbitrary.

I have also seen cases where the estimate of rho drifts towards -1. See this example:

https://wviechtb.github.io/metadat/reference/dat.kalaian1996.html

Sidenote: It's a bit sad that this happens in this example, since this did not occur in Kalaian and Raudenbush (1996) (from which these data were extracted) and in principle I am fitting the exact model described in the paper. But I am also not able to exactly reproduce the dataset - a comparison between the data printed in the article with some of the information / figures reveals some discrepancies, so there must be some printing errors in Table 1 :(

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>Behalf Of Reza Norouzian
>Sent: Wednesday, 01 September, 2021 7:54
>To: Simon Harmel
>Cc: R meta
>Subject: Re: [R-meta] Confusion about how to use UN structure
>
>Dear Simon,
>
>Without the data, it may be difficult to exactly know what the problem
>with your moderator or model is. But here is one issue to look for (I
>would say in a good number of cases, "the issue" to look for).
>
>In terms of random-effects' correlational structure, UN is the most
>demanding structure. Specifically, this structure aims to estimate the
>correlations among all unique pairs of the levels of a moderator
>across the levels of an ID variable (hereafter "study"). But to do so
>reliably, each pair of a moderator's levels should co-occur in a good
>number of studies. Otherwise, the UN-model will be estimating
>correlations among the pairs of levels of a moderator that have rarely
>or almost never occurred in the studies.
>
>To better see this, consider two categorical moderators each believed
>to have three correlated levels; "outcome" (with levels: A,B,C) and
>"UN_MOD" (with levels: type1,type2,type3) coded across 30 studies
>(data is shown below).
>
>Because in rma.mv(), one would place the target moderator on the
>"left" hand-side and the ID variable on the "right" hand-side of the
>vertical bar, let's input these variables to a little helper function
>to see which one of these moderators is technically more suitable for
>use with the "UN" structure:
>
>mod_for_un <- function(data,left,right) crossprod(table(data[c(right,left)])>0)
>
>For your model (random = ~ UN_MOD | study), we can do:
>
>mod_for_un(dat, left ="UN_MOD", right="study")
>
>       UN_MOD
>UN_MOD  type1 type2 type3
>  type1      1        1        1
>  type2      1        18      18
>  type3      1        18      30
>
>For the other model (random = ~ outcome | study), we can do:
>
>mod_for_un(dat, left ="outcome", right="study")
>
>       outcome
>outcome  A  B  C
>      A     30 30 30
>      B     30 30 30
>      C     30 30 30
>
>In both tables, the diagonal elements show the number of studies in
>which a particular level of the moderator has occurred. The
>off-diagonal elements show the number of studies in which a particular
>pair of levels of the moderator has co-occurred. Comparing the two
>tables, you see the problem with the UN_MOD moderator.
>
>With UN_MOD, "type1" has barely co-occurred with any other levels
>(type2 and type3) in any study. Thus, estimating correlations between
>pairs in UN_MOD that have to do with "type1" is essentially not
>possible.
>
>Given that, if you use UN_MOD for the UN structure, you may get an
>overparameterized model, and perhaps 2 rho estimates that are drifted
>toward unity. You can check this in the profile likelihood function
>shown as a flat curve for the rho estimates based on so few
>co-occurrences.
>
>So, some takeaways. First, a good UN_MOD is one whose levels vary
>within each study for increased co-occurrences of the levels. Second
>and related to the first point, the UN structure requires a relatively
>large number of studies. Third, if your categories are kind of fluid
>by nature, you may be able to slightly modify them to increase the
>chances of co-occurrences. Fourth, there is so much that the
>multilevel model can do to compensate for the lack of co-occurrences.
>Thus, if the UN structure was not really possible, you can adopt a
>simpler correlational structure like "HCS".
>
>Best regards,
>Reza
>
>#--- The data:
>set.seed(33)
>dat <- expand.grid(study = 1:30, outcome = rep(LETTERS[1:3],2))
>dat$UN_MOD <- sample(c("type1","type2","type3"),nrow(dat),replace =
>TRUE, prob = c(.01,.1,.8))
>with(dat, dat[order(study),])
>
>ps. note that rma.mv() does output co-occurrences, if you fit a UN
>model. But it may be helpful to check your moderators before fitting
>the model, in which case mod_for_un() may be useful.
>
>On Tue, Aug 31, 2021 at 7:17 PM Simon Harmel <sim.harmel using gmail.com> wrote:
>>
>> Hello All,
>>
>> I want to use the UN structure offered by metafor (rma.mv) and the
>> literature also supports correlating my moderator's categories in the
>> studies. But I get rho estimates of 1.00 which makes me think if there
>> is a problem with (a) my moderator itself, or (b) the way I fit the
>> model?
>>
>> For example, a typical model I use is as follows:
>>
>> rma.mv(eff_size ~ UN_MOD + x1 + x2 -1, V = V, struct = "UN",
>>                     random = ~ UN_MOD | study, data = data)
>>
>> where UN_MOD has three categories (ex. type 1, type 2 and type 3).
>>
>> I have had similar issues with the UN structure in the past, so any
>> thoughts are highly appreciated.
>>
>> Simon