[R-meta] Accounting for uncertainty in a mixed-effects regression

Sun Feb 4 13:57:45 CET 2018

Yes, I just wanted to tie up those loose ends before getting to the more recent discussion.

As James indicated, you cannot predict for a level that was not included in the original dataset. Also, factors are turned into a bunch of dummy variables, so if you want to use predict(), you have to specify the values of the dummy variables instead of including a factor/character variable in 'newmods'. See:

http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms

As for the actual problem:

In levels of 'Biome' with a small number of studies, the SE should tend to be larger (all else equal, a larger k for a level will lead to a decrease in the SE for the level). So in that sense, the model should reflect the larger uncertainty in levels of 'Biome' with little information.

For example:

library(metafor)
dat <- get(data(dat.bcg))
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
res <- rma(yi, vi, mods = ~ alloc - 1, data=dat)
res
plot(c(table(dat$alloc)), res$se, pch=19)

So, before thinking about other methods, how does that pan out in your data?

Best,
Wolfgang

>-----Original Message-----
>From: Cesar Terrer Moreno [mailto:cesar.terrer at me.com]
>Sent: Sunday, 04 February, 2018 13:52
>To: Viechtbauer Wolfgang (SP)
>Cc: r-sig-meta-analysis at r-project.org
>Subject: Re: [R-meta] Accounting for uncertainty in a mixed-effects
>regression
>
>Hi Wolfgang,
>
>I re-formulated this question more specifically and focused after some
>people helped with some initial issues. Please, see this newer question
>labeled as: “Accounting for uncertainty in a mixed-effects regression”.
>
>In this question, I’ve been initially helped by James, and I have learnt
>that the variable I need as a driver of uncertainty needs to be included
>as moderator. However, an issue with the predict question using
>categorical moderators persists.
>
>Thanks
>César
>
>On 2 Feb 2018, at 19:19, Cesar Terrer Moreno <cesar.terrer at me.com> wrote:
>
>Dear James,
>
>Thanks, this is very helpful.
>
>However, I can’t manage to make the predict function work. My entire
>model is a bit more complicated than in my original question, also
>including an interaction as moderator and Biome as you just suggested :
>
>meta.mv <- rma.mv(es, var, data=df, mods= ~ 1 + MAP + MAT*CO2dif +
>Biome))
>Now, when I use predict:
># correspondence of variables between df (meta-analysis database) and
>s.df (global data of the predictors to predict the effect globally) is:
># MAT == temperature, MAP == precipitation, Biome == Biome, CO2inc = 300
>
>pred.mv <- predict(meta.mv,  newmods = cbind(s.df$precipitation,
>s.df$temperature, CO2inc, s.df$temperature*CO2inc,s.df$Biome))
>Error in predict.rma(ECMmv, newmods = cbind(s.df$precipitation,
>s.df$temperature,  :
>  Dimensions of 'newmods' do not match dimensions of the model.
>
>I guess this error is because i) Biome is categorical, and ii) number of
>factor levels in df$Biome and s.df$Biome is different:
>
>> levels(df$Biome)
>[1]
>"Boreal_Forest"    "Cropland"         "Grassland"        "Shrubland"
>
>[5] "Temperate_Forest" "Tropical_Forest”
>
>> levels(s.df$Biome)
> [1]
>""                 "Boreal_Forest"    "Cropland"         "Grassland"
>
> [5]
>"Mixed"            "Nodata"           "Other"            "Shrubland"
>
> [9] "Temperate_Forest" "Tropical_Forest"
>
>This is because there are “biomes” in the world (s.df) that are not
>included in the dataset of experiments of the meta-analysis (s), although
>all Biomes in s are included in s.df with the same name.
>
>How can I make the predict function work? Thanks
>César
>
>On 2 Feb 2018, at 16:38, James Pustejovsky <jepusto at gmail.com> wrote:
>
>Cesar,
>
>I think you will need to include ecosystem as a predictor (i.e., in the
>fixed effect part of the model, not just in the random effects structure)
>in order to account for the greater uncertainty about tropical
>ecosystems. Without doing so, the predictions for the tropics will be
>generated based on ALL of the ecosystems in the model (controlling for
>precipitation and temperature), and so their uncertainty will be driven
>by the total sample size. (It would even be possible to exclude all of
>the data from tropical ecosystems, fit the model, and yet still generate
>predictions for the tropics.)
>
>So I would recommend putting it in, as something like:
>meta.mv <- rma.mv(es, var, data=df, random= ~1|Biome, mods= ~ 1 + MAP +
>MAT + Biome)
>(Or perhaps it's not necessary to put it in the random effects structure,
>if the variability in true effects appears to be unrelated to ecosystem.)
>
>James
>
>On Fri, Feb 2, 2018 at 5:48 AM, Cesar Terrer Moreno <cesar.terrer at me.com>
>wrote:
>I have calculated an effect size along a dataset of experiments
>distributed worldwide through a mixed-effects meta-regression. The effect
>size in the dataset depends on climate (y ~ temperature + precipitation):
>
>        meta <- rma(es, var, data=df, mods= ~ 1 + precip + temp)
>
>As I have data for temperature and precipitation for virtually all points
>on Earth, I have upscaled this effect and standard error (SE) globally,
>that is, applying the model to predict the effect for all points on
>Earth, thus creating a gridded map:
>
>        s <- stack(temperature, precipitation) # Raster stack of the maps
>of temperature and precipitation
>        s.df <- as.data.frame(s,xy=TRUE) # Transform raster stack to
>data.frame with coordinates
>
>        # predict ES and SE for all points with temperature and
>precipitation data
>        pred <- predict(meta, newmods = cbind(s.df$precipitation,
>s.df$temperature))
>The resulting "upscaled" predictions of SE thus represent the meta-
>analysis derived uncertainty. What this SE does not reflect is the
>uncertainty derived from the fact that the number of experiments in the
>dataset used to run the meta-regression is obviously limited, yet I
>predicted the effect globally, even in areas not covered by experiments.
>In particular, it is very obvious in the dataset that tropical ecosystems
>are not well represented, with just a couple of experiments in the meta-
>analysis. Therefore, the upscaled effect should be more uncertain in
>tropical regions. In the current version, however, SE is not particularly
>higher in the tropics than in other areas, because the model just depends
>on climate, not ecosystem representativeness. I would like to add a
>further level of uncertainty to SE based on the number of studies per
>type of ecosystem in the dataset. The idea is that ecosystems that are
>poorly sampled should have a higher SE than ecosystems with plenty of
>measurements in the dataset.
>
>Ecosystem type does not appear to explain variation in the effect, reason
>I haven’t included it as a moderator. This could just be the consequence
>of the low sample size of some ecosystem types. Regardless, ecosystem
>type should be forced in the model somehow to account for the uncertainty
>associated with the low sample size of some types of ecosystem.
>
>I have tried this:
>
>        meta.mv <- rma.mv(es, var, data=df, random= ~1|Biome, mods= ~ 1 +
>MAP + MAT)
>        pred.mv <- predict(meta.mv,
>                      newmods = cbind(s.df$precipitation,
>s.df$temperature),
>                      random= ~1|s.df$Biome)
>
>But the predictions of SE in pred.mv still show rather low SE values in
>poorly sampled areas.
>
>How can I force the model to predict larger errors in ecosystem types
>with low sample size? Should I include ecosystem type as a random or
>fixed effect? How should the “predict" function be called?