[R-meta] Accounting for uncertainty in a mixed-effects regression

Sun Feb 4 13:52:14 CET 2018

Hi Wolfgang,

I re-formulated this question more specifically and focused after some people helped with some initial issues. Please, see this newer question labeled as: “Accounting for uncertainty in a mixed-effects regression”.

In this question, I’ve been initially helped by James, and I have learnt that the variable I need as a driver of uncertainty needs to be included as moderator. However, an issue with the predict question using categorical moderators persists.

Thanks
César

> On 2 Feb 2018, at 19:19, Cesar Terrer Moreno <cesar.terrer at me.com> wrote:
> 
> Dear James,
> 
> Thanks, this is very helpful. 
> 
> However, I can’t manage to make the predict function work. My entire model is a bit more complicated than in my original question, also including an interaction as moderator and Biome as you just suggested :
> 
> meta.mv <- rma.mv(es, var, data=df, mods= ~ 1 + MAP + MAT*CO2dif + Biome))
> 
> Now, when I use predict:
> # correspondence of variables between df (meta-analysis database) and s.df (global data of the predictors to predict the effect globally) is:
> # MAT == temperature, MAP == precipitation, Biome == Biome, CO2inc = 300
> 
> pred.mv <- predict(meta.mv,  newmods = cbind(s.df$precipitation, s.df$temperature, CO2inc, s.df$temperature*CO2inc,s.df$Biome))
> 
> Error in predict.rma(ECMmv, newmods = cbind(s.df$precipitation, s.df$temperature,  : 
>   Dimensions of 'newmods' do not match dimensions of the model.
> 
> I guess this error is because i) Biome is categorical, and ii) number of factor levels in df$Biome and s.df$Biome is different:
> 
> > levels(df$Biome)
> [1] "Boreal_Forest"    "Cropland"         "Grassland"        "Shrubland"       
> [5] "Temperate_Forest" "Tropical_Forest” 
> 
> > levels(s.df$Biome)
>  [1] ""                 "Boreal_Forest"    "Cropland"         "Grassland"       
>  [5] "Mixed"            "Nodata"           "Other"            "Shrubland"       
>  [9] "Temperate_Forest" "Tropical_Forest" 
> 
> This is because there are “biomes” in the world (s.df) that are not included in the dataset of experiments of the meta-analysis (s), although all Biomes in s are included in s.df with the same name.
> 
> How can I make the predict function work? Thanks
> César
>> On 2 Feb 2018, at 16:38, James Pustejovsky <jepusto at gmail.com <mailto:jepusto at gmail.com>> wrote:
>> 
>> Cesar,
>> 
>> I think you will need to include ecosystem as a predictor (i.e., in the fixed effect part of the model, not just in the random effects structure) in order to account for the greater uncertainty about tropical ecosystems. Without doing so, the predictions for the tropics will be generated based on ALL of the ecosystems in the model (controlling for precipitation and temperature), and so their uncertainty will be driven by the total sample size. (It would even be possible to exclude all of the data from tropical ecosystems, fit the model, and yet still generate predictions for the tropics.)
>> 
>> So I would recommend putting it in, as something like:
>> meta.mv <http://meta.mv/> <- rma.mv <http://rma.mv/>(es, var, data=df, random= ~1|Biome, mods= ~ 1 + MAP + MAT + Biome) 
>> (Or perhaps it's not necessary to put it in the random effects structure, if the variability in true effects appears to be unrelated to ecosystem.)
>> 
>> James
>> 
>> 
>> On Fri, Feb 2, 2018 at 5:48 AM, Cesar Terrer Moreno <cesar.terrer at me.com <mailto:cesar.terrer at me.com>> wrote:
>> I have calculated an effect size along a dataset of experiments distributed worldwide through a mixed-effects meta-regression. The effect size in the dataset depends on climate (y ~ temperature + precipitation):
>> 
>>         meta <- rma(es, var, data=df, mods= ~ 1 + precip + temp)
>> 
>> As I have data for temperature and precipitation for virtually all points on Earth, I have upscaled this effect and standard error (SE) globally, that is, applying the model to predict the effect for all points on Earth, thus creating a gridded map:
>> 
>>         s <- stack(temperature, precipitation) # Raster stack of the maps of temperature and precipitation
>>         s.df <- as.data.frame(s,xy=TRUE) # Transform raster stack to data.frame with coordinates
>> 
>>         # predict ES and SE for all points with temperature and precipitation data
>>         pred <- predict(meta, newmods = cbind(s.df$precipitation, s.df$temperature))
>> The resulting "upscaled" predictions of SE thus represent the meta-analysis derived uncertainty. What this SE does not reflect is the uncertainty derived from the fact that the number of experiments in the dataset used to run the meta-regression is obviously limited, yet I predicted the effect globally, even in areas not covered by experiments. In particular, it is very obvious in the dataset that tropical ecosystems are not well represented, with just a couple of experiments in the meta-analysis. Therefore, the upscaled effect should be more uncertain in tropical regions. In the current version, however, SE is not particularly higher in the tropics than in other areas, because the model just depends on climate, not ecosystem representativeness. I would like to add a further level of uncertainty to SE based on the number of studies per type of ecosystem in the dataset. The idea is that ecosystems that are poorly sampled should have a higher SE than ecosystems with plenty of measurements in the dataset.
>> 
>> Ecosystem type does not appear to explain variation in the effect, reason I haven’t included it as a moderator. This could just be the consequence of the low sample size of some ecosystem types. Regardless, ecosystem type should be forced in the model somehow to account for the uncertainty associated with the low sample size of some types of ecosystem.
>> 
>> I have tried this:
>> 
>>         meta.mv <http://meta.mv/> <- rma.mv <http://rma.mv/>(es, var, data=df, random= ~1|Biome, mods= ~ 1 + MAP + MAT)
>>         pred.mv <http://pred.mv/> <- predict(meta.mv <http://meta.mv/>,
>>                       newmods = cbind(s.df$precipitation, s.df$temperature),
>>                       random= ~1|s.df$Biome)
>> 
>> But the predictions of SE in pred.mv <http://pred.mv/> still show rather low SE values in poorly sampled areas.
>> 
>> How can I force the model to predict larger errors in ecosystem types with low sample size? Should I include ecosystem type as a random or fixed effect? How should the “predict" function be called?
>>         [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-meta-analysis mailing list
>> R-sig-meta-analysis at r-project.org <mailto:R-sig-meta-analysis at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis <https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis>
> 

	[[alternative HTML version deleted]]