[R-meta] Prediction Intervals for estimates of random slope rma.mv model with predict.rma()

Fri Mar 22 15:25:35 CET 2024

Hey James,

Thanks again for the follow up - I was pretty sure I was missing something.
To take this one more step further, let's say I had multiple random slopes
on the same level - would the covariance between them also need to be taken
into the account? How would the equation look in that case?

Thank you very much for your time and expertise,
Zac

On Fri, Mar 22, 2024 at 10:03 AM James Pustejovsky <jepusto using gmail.com>
wrote:

> This doesn't look correct to me. The step of centering the predictor is
> essential for the code that I posted to work. If you want to be able to
> predict at arbitrary points, then you'll need to take into account the
> covariance between the random effects, so calculating
> Var(u_0j + u_1j * (a) + v_ij) = Var(u_0j) + a^2 * Var(u_1j) + 2 * a *
> Cov(u_0j, u_1j) + Var(v_ij)
>
> James
>
>
> On Thu, Mar 21, 2024 at 6:04 PM Zac Robinson <zacrobinson2015 using gmail.com>
> wrote:
>
>> James,
>>
>> Thank you very much for your reply - it was extremely helpful!
>>
>> I have taken your code, modified it a bit, and attempted to create a
>> function that works alongside emmprep / emmeans. From my tests it seems to
>> get the same results as your manual calculations but let me know if I'm
>> missing something - only thing I wondered is if "rho" and "phi" need to be
>> incorporated directly, as well as the covariances you mentioned
>>
>> Thanks again,
>> Zac
>>
>> # load packages and data
>> library(metafor)
>> library(tidyverse)
>> library(emmeans)
>>
>> data("dat.konstantopoulos2011")
>> df <-
>>   dat.konstantopoulos2011 %>%
>>   mutate(
>>     factor=sample(c("A","B"),nrow(.),replace=TRUE)
>>   )
>>
>> # create function
>> get.pi.emmeans.rma.mv <- function(model, mm){
>>
>>   tmp <- summary(mm)
>>   tmp <- tmp[ , ]
>>   test.stat <- stats::qt(mm using misc$level+(1-mm using misc$level)/2, tmp$df[[1]])
>>
>>   n_g <- colnames(model$mf.g)
>>   n_g2 <- ifelse(is.null(n_g),0,n_g[n_g!="intrcpt"&n_g!="outer"])
>>   l_g <- ifelse(is.null(n_g),0,mm using linfct[,n_g2])
>>   v_g <- ifelse(is.null(n_g),0,diag(model$G)%>%as.data.frame()%>%.[n_g2,])
>>
>>   n_h <- colnames(model$mf.h)
>>   n_h2 <- ifelse(is.null(n_h),0,n_h[n_h!="intrcpt"&n_h!="outer"])
>>   l_h <- ifelse(is.null(n_h),0,mm using linfct[,n_h2])
>>   v_h <- ifelse(is.null(n_h),0,diag(model$H)%>%as.data.frame()%>%.[n_h2,])
>>
>>   V_b <- tmp$SE^2
>>   V_ri <- sum(model$sigma2 + model$tau2[1] + model$gamma2[1])
>>   V_g <- l_g*v_g
>>   V_h <- l_h*v_h
>>
>>   PI <- test.stat * sqrt(V_b + V_ri + V_g + V_h)
>>
>>   tmp$lower.PI <- tmp$emmean - PI
>>   tmp$upper.PI <- tmp$emmean + PI
>>
>>   return(tmp)
>> }
>>
>>
>> # RANDOM INTERCEPT
>> --------------------------------------------------------
>> # fit random intercept model
>> m1 <- rma.mv(
>>   yi ~ year + factor, V = vi,
>>   random = list(~1|district, ~1|interaction(district,school)),
>>   data = df, method = "REML"
>> )
>>
>> # Prediction interval at year = 1989 (for Taylor) and factor = A
>> predict(m1, newmods = cbind(1989, 0), level = 95)
>>
>> # Using function
>> emmprep(m1,
>>         at=list(year=1989,
>>                 factor="A"))%>%
>>   emmeans(~year+factor,
>>           weights = "prop")%>%
>>   get.pi.emmeans.rma.mv(model=m1) #same
>>
>> # RANDOM SLOPE (x1)
>> -------------------------------------------------------
>> # fit single random slope model
>> m2 <- rma.mv(
>>   yi ~ year + factor, vi,
>>   random = list(~year|district, ~1|interaction(district,school)),
>>   data = df, method = "REML", struct = "GEN",control =
>> list(optimizer="optim")
>> )
>>
>> # Calculate prediction interval for year = 1989, factor = A
>> vec <- c(1, 1989, 0)
>> pred <- sum(m2$beta * vec)
>> Vb <- as.numeric(t(vec) %*% vcov(m2) %*% vec)
>> V_RE <- sum(m2$sigma2) + m2$tau2[1] + m2$tau2[2]*1989
>> S <- sqrt(Vb + V_RE)
>> z_crit <- qnorm((1 + 0.95) / 2)
>> pred + c(-1, 1) * z_crit * S
>>
>> # Using function
>> emmprep(m2,
>>         at=list(year=1989,
>>                 factor="A"))%>%
>>   emmeans(~year+factor,
>>           weights = "prop")%>%
>>   get.pi.emmeans.rma.mv(model=m2) # same
>>
>>
>>
>> On Thu, Mar 21, 2024 at 1:00 PM James Pustejovsky <jepusto using gmail.com>
>> wrote:
>>
>>> It looks like you are trying to generate a prediction interval for a new
>>> effect size given a specific set of predictors, based on a model with
>>> random slopes terms. A model like look like the following:
>>>
>>> Y_ij = b0 + b1 X_ij + u_0j + u_1j X_ij + v_ij + e_ij,
>>>
>>> where (u_0j, u_1j) are the study-level random effects with
>>> variance-covariance matrix T, v_ij is a within-study random effect with
>>> variance omega^2, and e_ij is the sampling error with known variance V_ij.
>>> Your goal might be to predict a new effect size for a study with X_ij = a,
>>> or the distribution of
>>>
>>> delta_ij = b0 + b1 * (a) + u_0j + u_1j * (a) + v_ij
>>>
>>> The tricky bit with these models (and probably the reason that PIs are
>>> not currently implemented in metafor) is that the study-level random
>>> effects terms can be correlated and you need to take the variances and
>>> covariances into account to get a prediction interval.
>>>
>>> My understanding is that metafor calculates prediction intervals by
>>> taking the predicted value given the fixed effects terms, plus or minus a
>>> critical value (call it z_p) times a quantity S, where S^2 is the sum of
>>> the variance of the predicted value and the variance of the random effects
>>> at the predicted value. For the example above:
>>>
>>> PI = [bhat0 + bhat1 * (a)] +/- z_p * S, where S = sqrt[ Var(bhat0 +
>>> bhat1 * (a)) + Var(u_0j + u_1j * (a) + v_ij) ]
>>>
>>> You can calculate the first term directly from the beta estimates that
>>> come out of metafor. You can calculate the first component of S by using
>>> the vcov matrix that comes out of metafor. And you can calculate the third
>>> term from the estimated variance components.
>>>
>>> To make the calculations simpler, it would be convenient to re-center
>>> the predictor at the value for which you're trying to get a prediction
>>> interval. In other words, if you set X'_ij = X_ij - a, then you're
>>> predicting at the value X'_ij = 0, and the prediction interval simplifies to
>>>
>>> PI = bhat0 +/- z_p * sqrt[ Var(bhat0) + Var(u_0j + v_ij)]
>>>
>>> Example code using Konstantopoulos2011 below.
>>>
>>> James
>>>
>>>
>>> # load packages and data
>>> library(metafor)
>>> library(tidyverse)
>>>
>>> data("dat.konstantopoulos2011")
>>> df <-
>>>   dat.konstantopoulos2011 %>%
>>>   mutate(
>>>     factor=sample(c("A","B"),nrow(.),replace=TRUE)
>>>   )
>>>
>>> # RANDOM INTERCEPT
>>> --------------------------------------------------------
>>> # fit random intercept model
>>> m1 <- rma.mv(
>>>   yi ~ year + factor, V = vi,
>>>   random = list(~1|study, ~1|district, ~1|school),
>>>   data = df, method = "REML"
>>> )
>>>
>>> # Prediction interval at year = 1989 (for Taylor) and factor = A
>>> predict(m1, newmods = cbind(1989, 0), level = 80)
>>>
>>> # Same thing, manual calcualtion
>>> vec <- c(1, 1989, 0)
>>> pred <- sum(m1$beta * vec)
>>> Vb <- as.numeric(t(vec) %*% vcov(m1) %*% vec)
>>> V_RE <- sum(m1$sigma2)
>>> S <- sqrt(Vb + V_RE)
>>> z_crit <- qnorm((1 + 0.8) / 2)
>>> pred + c(-1, 1) * z_crit * S
>>>
>>> # RANDOM SLOPE (x1)
>>> -------------------------------------------------------
>>> # fit single random slope model
>>> # Re-center the year predictor
>>> df$year_c <- df$year - 1989
>>> m2 <- rma.mv(
>>>   yi ~ year_c + factor, vi,
>>>   random = list(~ year_c | study, ~1|district, ~1|school),
>>>   data = df, method = "REML", struct = "GEN"
>>> )
>>>
>>> # Does not return prediction interval
>>> predict(m2, newmods = cbind(0, 0), level = 80)
>>>
>>> # Calculate prediction interval for year_c = 0, factor = A
>>> vec <- c(1, 0, 0)
>>> pred <- sum(m2$beta * vec)
>>> Vb <- as.numeric(t(vec) %*% vcov(m2) %*% vec)
>>> V_RE <- sum(m2$sigma2) + m2$tau2[1] # Using only the random intercept at
>>> the study level
>>> S <- sqrt(Vb + V_RE)
>>> z_crit <- qnorm((1 + 0.8) / 2)
>>> pred + c(-1, 1) * z_crit * S
>>>
>>>
>>> On Thu, Mar 21, 2024 at 11:19 AM Zac Robinson via R-sig-meta-analysis <
>>> r-sig-meta-analysis using r-project.org> wrote:
>>>
>>>> Wolfgang,
>>>>
>>>> Apologies for the incorrect method to reply - hopefully got it right
>>>> this
>>>> time. If I'm understanding things correctly - the coding can
>>>> (potentially)
>>>> end up giving me what I'm after, but I'm just specifying the sigma / sd
>>>> for
>>>> the prediction interval incorrectly (i.e., the variances can't just be
>>>> summed).
>>>>
>>>> Thus, I guess my follow up question is - how would you recommend going
>>>> about this calculation? The closest thing I can find in the literature
>>>> with
>>>> a similar variance calculation is in the context of R2 from Johnson et
>>>> al.,
>>>> where they calculate the "mean random effects variance" which seems
>>>> like it
>>>> could be useful here:
>>>>
>>>> https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12225
>>>>
>>>> Let me know what you think and thank you for your time and expertise,
>>>>
>>>> Zac
>>>>
>>>> On Thu, Mar 14, 2024 at 9:23 AM Viechtbauer, Wolfgang (NP) <
>>>> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>>>>
>>>> > Dear Zac,
>>>> >
>>>> > Please respond to the mailing list, not just to me personally.
>>>> >
>>>> > Just summing up all the variances isn't correct - the correct
>>>> computation
>>>> > of the prediction interval for such a random-effects structure is
>>>> > unfortunately more complex.
>>>> >
>>>> > Best,
>>>> > Wolfgang
>>>> >
>>>> > > -----Original Message-----
>>>> > > From: Zac Robinson <zacrobinson2015 using gmail.com>
>>>> > > Sent: Monday, March 11, 2024 16:07
>>>> > > To: Viechtbauer, Wolfgang (NP) <
>>>> > wolfgang.viechtbauer using maastrichtuniversity.nl>
>>>> > > Subject: Re: [R-meta] Prediction Intervals for estimates of random
>>>> slope
>>>> > rma.mv
>>>> > > model with predict.rma()
>>>> > >
>>>> > > Hey Wolfgang,
>>>> > >
>>>> > > Thank you very much for the reply. I actually have played around
>>>> with
>>>> > things a
>>>> > > bit more and think I've found a solution. I may not be specifying
>>>> the
>>>> > sigma
>>>> > > argument perfectly for a random slope model - but I think in concept
>>>> > this works!
>>>> > > Looks like emmeans / predict() have cross compatibility that allows
>>>> for
>>>> > > prediction intervals when sigma is specified
>>>> > >
>>>> > > Let me know what you think,
>>>> > > Zac
>>>> > >
>>>> > > # SETUP
>>>> > -------------------------------------------------------------------
>>>> > > #load packages and data
>>>> > > library(metafor)
>>>> > > library(tidyverse)
>>>> > > library(emmeans)
>>>> > >
>>>> > > data("dat.konstantopoulos2011")
>>>> > > df=dat.konstantopoulos2011%>%
>>>> > >   as.data.frame()%>%
>>>> > >   mutate(factor=sample(c("A","B"),56,replace=TRUE))
>>>> > >
>>>> > > # RANDOM SLOPE (x1)
>>>> > -------------------------------------------------------
>>>> > > #fit single random slope model
>>>> > > m2= rma.mv(yi ~ year+factor, vi,
>>>> > >           random = list(~year|study, ~1|district, ~1|school),
>>>> > >           data = df, method = "REML",struct = "GEN",
>>>> > >           control = list(optimizer="bobyqa"))
>>>> > >
>>>> > > #extract estimates with emmmprep + emmeans + predict
>>>> > > est.ci=emmprep(m2,
>>>> > >         at=list(year=1976:2000))%>%
>>>> > >   emmeans(~year,
>>>> > >           weights = "prop",
>>>> > >           sigma=sqrt(sum(
>>>> > >             m2$sigma2,
>>>> > >             m2$tau2,
>>>> > >             m2$gamma2
>>>> > >           )))
>>>> > >
>>>> > > pi= est.ci%>%
>>>> > >   predict(interval="prediction")%>%
>>>> > >   select(lower.PL,upper.PL)
>>>> > >
>>>> > > preds=cbind(est.ci,pi)
>>>> > > preds # prediction intervals are now added
>>>> > >
>>>> > > ggplot(data = preds,
>>>> > >        aes(x=year,
>>>> > >            y=emmean))+theme_classic()+
>>>> > >   geom_point(data=df,
>>>> > >              aes(y=yi,
>>>> > >                  size=1/vi))+
>>>> > >   geom_ribbon(aes(ymin=lower.PL,
>>>> > >                   ymax=upper.PL),
>>>> > >               linetype="dashed",
>>>> > >               alpha=0,
>>>> > >               color="black")+
>>>> > >   geom_ribbon(aes(ymin=asymp.LCL,
>>>> > >                   ymax=asymp.UCL),
>>>> > >               alpha=0.25)+
>>>> > >   geom_line()
>>>> > >
>>>> > > On Mon, Mar 11, 2024 at 2:06 AM Viechtbauer, Wolfgang (NP)
>>>> > > <mailto:wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>>>> > > Dear Zac,
>>>> > >
>>>> > > predict() currently does not provide predicition intervals for
>>>> > http://rma.mv
>>>> > > models with a GEN random effects structure. Not sure when I will get
>>>> > around to
>>>> > > implementing that.
>>>> > >
>>>> > > Best,
>>>> > > Wolfgang
>>>> > >
>>>> > > > -----Original Message-----
>>>> > > > From: R-sig-meta-analysis <mailto:
>>>> > r-sig-meta-analysis-bounces using r-project.org>
>>>> > > On Behalf
>>>> > > > Of Zac Robinson via R-sig-meta-analysis
>>>> > > > Sent: Sunday, March 10, 2024 16:26
>>>> > > > To: mailto:r-sig-meta-analysis using r-project.org
>>>> > > > Cc: Zac Robinson <mailto:zacrobinson2015 using gmail.com>
>>>> > > > Subject: [R-meta] Prediction Intervals for estimates of random
>>>> slope
>>>> > > http://rma.mv
>>>> > > > model with predict.rma()
>>>> > > >
>>>> > > > Dear All,
>>>> > > >
>>>> > > > I am working on extracting the adjusted effects from a
>>>> dose-response
>>>> > > > multilevel meta-regression. One issue I've run into is getting the
>>>> > > > appropriate prediction intervals on my estimates when the model
>>>> > > > includes random slopes (specifically when struct = "GEN" is
>>>> included).
>>>> > From
>>>> > > > what I gather from the help files, this comes down to
>>>> appropriately
>>>> > > > specifying the `tau2.levels` and `gamma2.levels` arguments within
>>>> > > > predict.rma() - however, I am unsure how to get this to work.
>>>> > > >
>>>> > > > In the code below I've tried to demonstrate my issue and explore
>>>> the
>>>> > other
>>>> > > > solutions I'm aware of through the orchaRd package). My end goal
>>>> > requires
>>>> > > > adjusting for both continuous and categorical moderators so
>>>> ideally the
>>>> > > > solution stays within the emmprep() / emmeans() framework (which
>>>> makes
>>>> > > > proportional weights and setting non-focal continuous variables
>>>> at the
>>>> > mean
>>>> > > > easy), but I realize this may not be possible.
>>>> > > >
>>>> > > > Thank you very much in advance!
>>>> > > >
>>>> > > > Zac
>>>> > > >
>>>> > > > # SETUP
>>>> > -------------------------------------------------------------------
>>>> > > > #load packages and data
>>>> > > > library(metafor)
>>>> > > > library(tidyverse)
>>>> > > > library(orchaRd)
>>>> > > > library(emmeans)
>>>> > > >
>>>> > > > data("dat.konstantopoulos2011")
>>>> > > > df=dat.konstantopoulos2011%>%
>>>> > > >   as.data.frame()%>%
>>>> > > >   mutate(factor=sample(c("A","B"),nrow(df),replace=TRUE))
>>>> > > >
>>>> > > > # RANDOM INTERCEPT
>>>> > --------------------------------------------------------
>>>> > > > #fit random intercept model
>>>> > > > m1=http://rma.mv(yi ~ year+factor, vi,
>>>> > > >           random = list(~1|study, ~1|district, ~1|school),
>>>> > > >           data = df, method = "REML")
>>>> > > >
>>>> > > > #extract estimates with predict.rma
>>>> > > > predict(m1,
>>>> > > >         newmods =cbind(rep(c(1976:2000),2),
>>>> > > >                        rep(c(0,1),each=25)),
>>>> > > >         addx = TRUE) # works fine
>>>> > > >
>>>> > > > #extract estimates with emmprep
>>>> > > > emmprep(m1,
>>>> > > >         at=list(year=1976:2000))%>%
>>>> > > >   emmeans(~year,
>>>> > > >           weights = "prop")%>%
>>>> > > >   pred_interval_esmeans(model=m1,
>>>> > > >                         mm=.,
>>>> > > >                         mod="year") # works fine
>>>> > > >
>>>> > > > #extract estimates with orchaRd
>>>> > > > mod_results(m1,
>>>> > > >             mod = "year",
>>>> > > >             group = "study") # works fine
>>>> > > >
>>>> > > > # RANDOM SLOPE (x1)
>>>> > -------------------------------------------------------
>>>> > > > #fit single random slope model
>>>> > > > m2=http://rma.mv(yi ~ year+factor, vi,
>>>> > > >           random = list(~year|study, ~1|district, ~1|school),
>>>> > > >           data = df, method = "REML",struct = "GEN",
>>>> > > >           control = list(optimizer="bobyqa"))
>>>> > > >
>>>> > > > #extract estimates with predict.rma
>>>> > > > predict(m2, newmods =cbind(rep(c(1976:2000),2),
>>>> > > >                            rep(c(0,1),each=25)),
>>>> > > >         addx = TRUE) # no prediction intervals
>>>> > > >
>>>> > > > predict(m2, newmods =cbind(rep(c(1976:2000),2),
>>>> > > >                            rep(c(0,1),each=25)),
>>>> > > >         addx = TRUE,
>>>> > > >         tau2.levels = levels(year), # try adding tau2.levels
>>>> > > >         ) # still no prediction intervals
>>>> > > >
>>>> > > > #extract estimates with emmprep
>>>> > > > emmprep(m2,
>>>> > > >         at=list(year=1976:2000))%>%
>>>> > > >   emmeans(~year,
>>>> > > >           weights = "prop")%>%
>>>> > > >   pred_interval_esmeans(model=m2,
>>>> > > >                         mm=.,
>>>> > > >                         mod="year") # works but gives an error
>>>> > > >
>>>> > > > #extract estimates with orchaRd
>>>> > > > mod_results(m2,
>>>> > > >             mod = "year",
>>>> > > >             group = "study") # works but gives an error
>>>> > > >
>>>> > > >
>>>> > > > # RANDOM SLOPE (x2)
>>>> > -------------------------------------------------------
>>>> > > > #fit double random slope model
>>>> > > > m3=http://rma.mv(yi ~ year+factor, vi,
>>>> > > >           random = list(~year|study, ~year|district, ~1|school),
>>>> > > >           data = df, method = "REML",struct = c("GEN","GEN"),
>>>> > > >           control = list(optimizer="bobyqa"))
>>>> > > >
>>>> > > > #extract estimates with predict.rma
>>>> > > > predict(m3, newmods =cbind(rep(c(1976:2000),2),
>>>> > > >                            rep(c(0,1),each=25)),
>>>> > > >         addx = TRUE) # no prediction intervals
>>>> > > >
>>>> > > > predict(m3, newmods =cbind(rep(c(1976:2000),2),
>>>> > > >                             rep(c(0,1),each=25)),
>>>> > > >         addx = TRUE,
>>>> > > >         tau2.levels = levels(year),# try adding tau2.levels
>>>> > > >         gamma2.levels = levels(year)# try adding gamma2.levels
>>>> > > >         ) # still no prediction intervals
>>>> > > >
>>>> > > > #extract estimates with emmprep
>>>> > > > emmprep(m3,
>>>> > > >              at=list(year=1976:2000))%>%
>>>> > > >   emmeans(~year,
>>>> > > >           weights = "prop")%>%
>>>> > > >   pred_interval_esmeans(model=m3,
>>>> > > >                         mm=.,
>>>> > > >                         mod="year") # works but gives an error
>>>> > > >
>>>> > > > #extract estimates with orchaRd
>>>> > > > mod_results(m3,
>>>> > > >             mod = "year",
>>>> > > >             group = "study") # works but gives an error
>>>> >
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
>>>> To manage your subscription to this mailing list, go to:
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>>>
>>>

	[[alternative HTML version deleted]]