[R-meta] Accounting for uncertainty in a mixed-effects regression

Mon Feb 5 13:26:19 CET 2018

Hi Wolfgang,

Thanks for your response. 

Yes, including “Biome” as a moderator seems like a reasonable approach to force the model to predict larger SE values when the sample size in a certain biome is low.

My model would thus be:

ECMmv <- rma.mv(es, var, data=ecm.df, mods= ~ 1 + MAP + MAT*CO2dif + Biome)

Biome has six levels:
> levels(s.df$Biome)
[1] "Grassland"        "Boreal_Forest"    "Cropland"         "Shrubland"       
[5] "Temperate_Forest" "Tropical_Forest" 

Now I don’t know how to use the predict function using a dataset that includes data for all predictors on a per pixel basis:

> head(s.df)
         x      y temperature precipitation     Biome CNr
1 -179.875 89.875          NA            NA Grassland  NA
2 -179.625 89.875          NA            NA Grassland  NA
3 -179.375 89.875          NA            NA Grassland  NA
4 -179.125 89.875          NA            NA Grassland  NA
5 -178.875 89.875          NA            NA Grassland  NA
6 -178.625 89.875          NA            NA Grassland  NA

CO2inc <- 300 # ppm

ECMpred.mv <- predict(ECMmv, 
                      newmods = cbind(s.df$precipitation, s.df$temperature, CO2inc, s.df$temperature*CO2inc,s.df$Biome))

How would you configure “predict” to predict ES and SE per row as a function of Biome type? 
Thanks a lot
César

> On 4 Feb 2018, at 13:57, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> 
> Yes, I just wanted to tie up those loose ends before getting to the more recent discussion.
> 
> As James indicated, you cannot predict for a level that was not included in the original dataset. Also, factors are turned into a bunch of dummy variables, so if you want to use predict(), you have to specify the values of the dummy variables instead of including a factor/character variable in 'newmods'. See:
> 
> http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms
> 
> As for the actual problem:
> 
> In levels of 'Biome' with a small number of studies, the SE should tend to be larger (all else equal, a larger k for a level will lead to a decrease in the SE for the level). So in that sense, the model should reflect the larger uncertainty in levels of 'Biome' with little information.
> 
> For example:
> 
> library(metafor)
> dat <- get(data(dat.bcg))
> dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
> res <- rma(yi, vi, mods = ~ alloc - 1, data=dat)
> res
> plot(c(table(dat$alloc)), res$se, pch=19)
> 
> So, before thinking about other methods, how does that pan out in your data?
> 
> Best,
> Wolfgang
> 
>> -----Original Message-----
>> From: Cesar Terrer Moreno [mailto:cesar.terrer at me.com]
>> Sent: Sunday, 04 February, 2018 13:52
>> To: Viechtbauer Wolfgang (SP)
>> Cc: r-sig-meta-analysis at r-project.org
>> Subject: Re: [R-meta] Accounting for uncertainty in a mixed-effects
>> regression
>> 
>> Hi Wolfgang,
>> 
>> I re-formulated this question more specifically and focused after some
>> people helped with some initial issues. Please, see this newer question
>> labeled as: “Accounting for uncertainty in a mixed-effects regression”.
>> 
>> In this question, I’ve been initially helped by James, and I have learnt
>> that the variable I need as a driver of uncertainty needs to be included
>> as moderator. However, an issue with the predict question using
>> categorical moderators persists.
>> 
>> Thanks
>> César
>> 
>> On 2 Feb 2018, at 19:19, Cesar Terrer Moreno <cesar.terrer at me.com> wrote:
>> 
>> Dear James,
>> 
>> Thanks, this is very helpful.
>> 
>> However, I can’t manage to make the predict function work. My entire
>> model is a bit more complicated than in my original question, also
>> including an interaction as moderator and Biome as you just suggested :
>> 
>> meta.mv <- rma.mv(es, var, data=df, mods= ~ 1 + MAP + MAT*CO2dif +
>> Biome))
>> Now, when I use predict:
>> # correspondence of variables between df (meta-analysis database) and
>> s.df (global data of the predictors to predict the effect globally) is:
>> # MAT == temperature, MAP == precipitation, Biome == Biome, CO2inc = 300
>> 
>> pred.mv <- predict(meta.mv,  newmods = cbind(s.df$precipitation,
>> s.df$temperature, CO2inc, s.df$temperature*CO2inc,s.df$Biome))
>> Error in predict.rma(ECMmv, newmods = cbind(s.df$precipitation,
>> s.df$temperature,  :
>>   Dimensions of 'newmods' do not match dimensions of the model.
>> 
>> I guess this error is because i) Biome is categorical, and ii) number of
>> factor levels in df$Biome and s.df$Biome is different:
>> 
>>> levels(df$Biome)
>> [1]
>> "Boreal_Forest"    "Cropland"         "Grassland"        "Shrubland"
>> 
>> [5] "Temperate_Forest" "Tropical_Forest”
>> 
>>> levels(s.df$Biome)
>>  [1]
>> ""                 "Boreal_Forest"    "Cropland"         "Grassland"
>> 
>>  [5]
>> "Mixed"            "Nodata"           "Other"            "Shrubland"
>> 
>>  [9] "Temperate_Forest" "Tropical_Forest"
>> 
>> This is because there are “biomes” in the world (s.df) that are not
>> included in the dataset of experiments of the meta-analysis (s), although
>> all Biomes in s are included in s.df with the same name.
>> 
>> How can I make the predict function work? Thanks
>> César
>> 
>> On 2 Feb 2018, at 16:38, James Pustejovsky <jepusto at gmail.com> wrote:
>> 
>> Cesar,
>> 
>> I think you will need to include ecosystem as a predictor (i.e., in the
>> fixed effect part of the model, not just in the random effects structure)
>> in order to account for the greater uncertainty about tropical
>> ecosystems. Without doing so, the predictions for the tropics will be
>> generated based on ALL of the ecosystems in the model (controlling for
>> precipitation and temperature), and so their uncertainty will be driven
>> by the total sample size. (It would even be possible to exclude all of
>> the data from tropical ecosystems, fit the model, and yet still generate
>> predictions for the tropics.)
>> 
>> So I would recommend putting it in, as something like:
>> meta.mv <- rma.mv(es, var, data=df, random= ~1|Biome, mods= ~ 1 + MAP +
>> MAT + Biome)
>> (Or perhaps it's not necessary to put it in the random effects structure,
>> if the variability in true effects appears to be unrelated to ecosystem.)
>> 
>> James
>> 
>> On Fri, Feb 2, 2018 at 5:48 AM, Cesar Terrer Moreno <cesar.terrer at me.com>
>> wrote:
>> I have calculated an effect size along a dataset of experiments
>> distributed worldwide through a mixed-effects meta-regression. The effect
>> size in the dataset depends on climate (y ~ temperature + precipitation):
>> 
>>         meta <- rma(es, var, data=df, mods= ~ 1 + precip + temp)
>> 
>> As I have data for temperature and precipitation for virtually all points
>> on Earth, I have upscaled this effect and standard error (SE) globally,
>> that is, applying the model to predict the effect for all points on
>> Earth, thus creating a gridded map:
>> 
>>         s <- stack(temperature, precipitation) # Raster stack of the maps
>> of temperature and precipitation
>>         s.df <- as.data.frame(s,xy=TRUE) # Transform raster stack to
>> data.frame with coordinates
>> 
>>         # predict ES and SE for all points with temperature and
>> precipitation data
>>         pred <- predict(meta, newmods = cbind(s.df$precipitation,
>> s.df$temperature))
>> The resulting "upscaled" predictions of SE thus represent the meta-
>> analysis derived uncertainty. What this SE does not reflect is the
>> uncertainty derived from the fact that the number of experiments in the
>> dataset used to run the meta-regression is obviously limited, yet I
>> predicted the effect globally, even in areas not covered by experiments.
>> In particular, it is very obvious in the dataset that tropical ecosystems
>> are not well represented, with just a couple of experiments in the meta-
>> analysis. Therefore, the upscaled effect should be more uncertain in
>> tropical regions. In the current version, however, SE is not particularly
>> higher in the tropics than in other areas, because the model just depends
>> on climate, not ecosystem representativeness. I would like to add a
>> further level of uncertainty to SE based on the number of studies per
>> type of ecosystem in the dataset. The idea is that ecosystems that are
>> poorly sampled should have a higher SE than ecosystems with plenty of
>> measurements in the dataset.
>> 
>> Ecosystem type does not appear to explain variation in the effect, reason
>> I haven’t included it as a moderator. This could just be the consequence
>> of the low sample size of some ecosystem types. Regardless, ecosystem
>> type should be forced in the model somehow to account for the uncertainty
>> associated with the low sample size of some types of ecosystem.
>> 
>> I have tried this:
>> 
>>         meta.mv <- rma.mv(es, var, data=df, random= ~1|Biome, mods= ~ 1 +
>> MAP + MAT)
>>         pred.mv <- predict(meta.mv,
>>                       newmods = cbind(s.df$precipitation,
>> s.df$temperature),
>>                       random= ~1|s.df$Biome)
>> 
>> But the predictions of SE in pred.mv still show rather low SE values in
>> poorly sampled areas.
>> 
>> How can I force the model to predict larger errors in ecosystem types
>> with low sample size? Should I include ecosystem type as a random or
>> fixed effect? How should the “predict" function be called?