[R-meta] Post-hoc weighted analysis based on number of observations

Tue Jan 30 18:43:31 CET 2018

Dear Wolfgang,

I have the feeling that spatial uncertainty would help defining uncertainty based on the geographical distance among the coordinates of the individual locations of the studies in the dataset.

However, in this case I think a simpler approach could suffice. For this particular matter, we could assume that a good representation of the different “behaviours” of the system can be achieved through sampling intensively all types of biomes on Earth (e.g. grasslands, tropical forests, temperate forests, boreal forests), thus biomes as the unit of variability among studies. 

In this case, ‘biome’ is not a significantly important predictor, but this could be just the result of the low sample size in some biomes (or not). In any case, we have to somehow account for the low sample size in some biomes, allowing us to report the effect size is poorly sampled biomes yet with a very large uncertainty. This distinction between geographical uncertainty and biome representation is important, because with biome as a driver of uncertainty we can assume that e.g. uncertainty in a grassland in China should be low despite no sampling in Chinese grasslands, just because there are many other studies with grasslands in Europe, Australia and US. However, uncertainty in a tropical forest in Brazil should be large because there are very few tropical forests in the dataset, even if there are many grassland studies in Brazil in the dataset. This is the type of biome-driven uncertainty we need. 

Having said that, I don’t know how to account for this biome-driven uncertainty. 

I have tried to include ‘Biome’ as a random effect in the model:

meta <- rma.mv(es, var, data=df, method="ML", random= ~1|Biome, mods= ~ 1 + precipitation + temperature)

As I have data for temperature, precipitation, and biome type for virtually all points on Earth, I have upscaled this effect and standard error (SE) globally, creating a gridded map of the effect and SE:

pred <- predict(meta, newmods = cbind(s.df$precipitation, s.df$temperature), 
random= ~1|s.df$Biome)

SEraster <- rasterFromXYZ(pred[,c("x", "y", "se")],crs="+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0”) # x and y are the coordinates in each cell

However, the resulting raster of the SE of the effect is quite similar to the raster obtained with the model without the random effect, thus with low SE even in biomes that are poorly sampled (e.g. tropical forests). Why? How can I create a model where SEs are higher in regions with low biome representation?

Thanks

> On 25 Jan 2018, at 11:08, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> 
> I will need to mull over this for a bit, but I think this falls under 'spatial uncertainty' (a term worth googling in the meantime).
> 
> Best,
> Wolfgang
> 
>> -----Original Message-----
>> From: Cesar Terrer Moreno [mailto:cesar.terrer at me.com]
>> Sent: Thursday, 25 January, 2018 7:57
>> To: Viechtbauer Wolfgang (SP)
>> Cc: r-sig-meta-analysis at r-project.org
>> Subject: Re: [R-meta] Post-hoc weighted analysis based on number of
>> observations
>> 
>> Dear Wolfgang,
>> 
>> Thanks so much for your reply. You have captured the essence of the
>> question perfectly.
>> 
>> I have successfully scaled the meta-analysis-derived SE, so I have
>> basically produced a global map of the SE of the effect:
>> 
>> SE <- predict(meta,
>>                     newmods = cbind(s.df$precipitation, s.df$temperature,
>> CO2inc, s.df$temperature*CO2inc))$se
>> 
>> However, as you said, some locations, in this case ecosystems (e.g.
>> tropical forests) are poorly represented in the dataset. Therefore, a
>> proper assessment of the uncertainties of the approach should account for
>> the uncertainty associated with the sampling effort (or the lack of) in
>> some regions. Reviewers will check this for sure.
>> 
>> It turns out that ecosystem type, per se, is not a good predictor, thus
>> including it in the meta-regression probably does not make much sense (or
>> maybe yes). I was thus thinking more on a post-hoc solution, not
>> necessarily in a meta-analytic context, so maybe this distribution list
>> is not the right place to ask this question. The idea is to increase SE
>> in pixels dominated by ecosystems that are poorly sampled. The final
>> quantification of uncertainties would thus be an aggregation of the SEs
>> and some sort of multiplier that adds uncertainty in a particular pixel
>> as a function of the representativeness of the type of ecosystem in that
>> pixel.
>> 
>> For example:
>> 
>> group_by(ecosystem_type) %>% summarise(n = n()) %>% mutate (weight =
>> n/sum(n))
>> 
>> SEw= max(SE,na.rm=T) - max(SE,na.rm=T)*weight,
>> 
>> SEsum = SE + SEw
>> 
>> SEsum would thus be the sum of SE and another level of error driven by
>> the sample size of the type of ecosystem, and constrained to fall within
>> the range of observed SE from the dataset.
>> 
>> But I think this approach is not very elegant. Any other ideas?
>> Thanks
>> César
>> 
>> On 24 Jan 2018, at 23:56, Viechtbauer Wolfgang (SP)
>> <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>> 
>> Dear Cesar,
>> 
>> Let me try to understand the essence of your question/issue and abstract
>> it a bit from the specifics of your data. So, if I understand things
>> correctly, you have data from various places on Earth. Let's pretend
>> those places are on a 2d surface, so something like this (where *
>> indicates a place where you have data):
>> 
>> +------------------------+
>> |     *                  |
>> |  *                     |
>> |     *                  |
>> |                     *  |
>> |                 *  *   |
>> |                        |
>> +------------------------+
>> 
>> You have fitted a model that relates an outcome to some predictor
>> variables based on the data for these places. Now you actually have the
>> values of the predictor variables for *all* places on that surface and
>> you have computed the corresponding predicted values. But there are
>> locations for which there were no data to begin with (e.g., upper right
>> and lower left) and hence you want the SEs of the predicted values to
>> reflect this lack of information in those areas and you are wondering how
>> to do that. Does that capture the essence of your question?
>> 
>> Best,
>> Wolfgang
>> 
>> 
>> -----Original Message-----
>> From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
>> project.org] On Behalf Of Cesar Terrer Moreno
>> Sent: Monday, 22 January, 2018 18:52
>> To: r-sig-meta-analysis at r-project.org
>> Subject: [R-meta] Post-hoc weighted analysis based on number of
>> observations
>> 
>> I have a gridded dataset representing the standard error (SE) of an
>> effect. This SE was calculated through a meta-analysis and subsequent
>> predictive model applied on a grid:
>> 
>> ECMmeta <- rma(es, var, data=ecm.df ,control=list(stepadj=.5), mods= ~ 1
>> + MAP + MAT*CO2dif, knha=TRUE)
>> options(na.action = "na.pass")
>> ECMpred <- predict(ECMmeta,
>>                   newmods = cbind(s.df$precipitation, s.df$temperature,
>> CO2inc, s.df$temperature*CO2inc))
>> ECMrelSE <- rasterFromXYZ(ECMpred[,c("x", "y", "se")],crs="+proj=longlat
>> +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
>> 
>> I would like to add a further level of uncertainty to SE based on the
>> number of measurements (observations) per type of ecosystem in the
>> dataset. The idea is that ecosystems that are poorly represented by
>> experiments in the dataset should have a higher SE than ecosystems with
>> plenty of measurements in the dataset.
>> 
>> I thought I could, for example, calculate an ecosystem-based weight as:
>> 
>> weight = n/sum(n)
>> 
>> That is, number of observations in a particular ecosystem divided by the
>> total of observations.
>> 
>> The next step would be to apply a weighting approach to each pixel. First
>> approach I've come up with is to simply multiply SE and the inverse of
>> the weight:
>> 
>> SEw=SE*(1/weight)
>> 
>> But the values are extremely high.
>> 
>> An approach like this would be more like an post-hoc patch. I am sure
>> something like this can be done within the meta-analysis at the
>> beginning. Alternatively, a better post-hoc approach or ideas to
>> investigate further would be welcome. Any recommendation or basic
>> approach commonly used to add further uncertainty to areas with low
>> representativeness?
>> 
>> Thanks