[R-meta] Moderator analysis with missing values (Methods and interpretations)

Tue Sep 11 18:55:34 CEST 2018

Hi Wolfgang and Michael,

Thank you for your quick responses and help.

Regarding your (Michael) question about clarification, I have a total of 51 comparisons. The individual testing of moderators is done on all studies for which that moderator is available. The model with all moderators is indeed a subset of these 51 studies (k=27), as for each moderator, different studies are excluded because of missing values. 

When I first brought this issue up in this mailing list a few months ago, Wolfgang suggested a side-by-side comparison of the moderator tests, so that I present both the individual moderator tests, as well as the results when all moderators are included in the same model. I think this makes sense, but I want to make sure I interpret/present these findings correctly (which, it seems, I am not doing so far).

From both of your answers, it seems my interpretation/suggested reporting was too simplistic/strong.

For regression models with primary data, I think I would run different models and discuss the results independently, or (assuming the level of missing data would be lower than in the current situation), just focus on the model that includes all variables.

I see how interpreting the moderators is not as straightforward as I imagined them and I definitely agree that any conclusion should avoid causality. Thank you for the clear explanation. What would be a sensible way to interpret these findings?

Regarding multiple imputation: I have not looked into this, but I wonder whether it can help when the missing values are across moderators? (As I expect that somehow the existing moderator/ES data are used to compute those?)

Best wishes,
Tommy
> On 11 Sep 2018, at 16:51, Michael Dewey <lists using dewey.myzen.co.uk> wrote:
> 
> Dear Tommy
> 
> Thinking about this a bit more, have you considered multiple imputation of the moderators? The main issue about MI is that is you have hardly any missing it is not worth it and if you have a lot then the results are very imprecise which reflects the lack of data of course.
> 
> Michael
> 
> On 11/09/2018 14:53, Viechtbauer, Wolfgang (SP) wrote:
>> Hi Tommy,
>> Some additional thoughts:
>> - The same questions arise in the context of primary research, so how would you answer these questions if you were running regression models with primary data?
>> - Michael raises an important point: When fitting larger models, it might happen that some studies/estimates are dropped due to listwise deletion. In that case, the comparison between results becomes a bit more problematic.
>> - Even for moderator A, the association might be confounded by other moderators that are not included in the larger model. So even moderator A might not really have an effect. But I would avoid wording such as 'moderator A has an effect' anyway, as this sounds a bit 'causal'. In any case, moderator A certainly leads to the simplest story, so this might make this finding most convincing to some.
>> - Power might be low to detect moderator B in the larger model. Or it might be that B was confounded with some 'real' moderators and fitting the larger model eliminated/reduced that confounding.
>> - For C, it could be that power is low when tested individually due to a large amount of residual heterogeneity. When fitting the larger model, residual heterogeneity might be reduced, making it easier to detect the relevance of C.
>> Of course, it is impossible to say for sure what is going on in any particular case.
>> Best,
>> Wolfgang
>> -----Original Message-----
>> From: Michael Dewey [mailto:lists using dewey.myzen.co.uk]
>> Sent: Tuesday, 11 September, 2018 15:43
>> To: Tommy van Steen; Viechtbauer, Wolfgang (SP)
>> Cc: r-sig-meta-analysis using r-project.org
>> Subject: Re: [R-meta] Moderator analysis with missing values (Methods and interpretations)
>> Just to clarify Tommy, are you fitting all three models to the same set
>> of studies or, as it seems from the exchange with Wolfgang below, are
>> they being fitted to different subsets? If the latter then I think any
>> conclusions comparing them must be very tentative.
>> Michael
>> On 11/09/2018 14:04, Tommy van Steen wrote:
>>> Dear Wolfgang,
>>> 
>>> I have a follow-up question regarding the point of doing a side-by-side comparison of moderator analysis (testing moderators both individually and as part of a model that includes all moderators). Looking at the significant moderators, there are three types of outcomes in my meta-analysis:
>>> 
>>> Moderator A: Significant effect when tested both individually, and as part of larger model.
>>> Moderator B: Significant effect when tested individually, but not when tested as part of larger model.
>>> Moderator C: Significant effect when tested in a larger model, but not when tested individually.
>>> 
>>> Am I correct in saying that:
>>> Moderator A has an effect, as the moderator is significant in both models.
>>> Moderator B probably doesn’t have an effect, as the effect disappears when other factors are considered.
>>> Moderator C has an effect, but only in interaction with other factors.
>>> 
>>> I am especially unsure about my interpretation of Moderator C.
>>> 
>>> Best wishes,
>>> Tommy
>>> 
>>>> On 6 Jul 2018, at 14:11, Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>>>> 
>>>> Hi Tommy,
>>>> 
>>>> 1) This is a tricky (and common) issue. I suspect this is one of the reasons why moderators are still often tested one at a time (to 'maximize' the number of studies included in an analysis when testing each moderator). But this makes it impossible to sort out the unique contributions of correlated moderators, so this isn't ideal. One could consider imputation techniques, although this isn't common practice in the meta-analysis context. So, as for a more pragmatic approach, why not do both? If a moderator is found to be relevant when tested individually and also when other moderators are included, then this gives should give us more confidence in the finding.
>>>> 
>>>> 2) Possible, sure. Is it useful, maybe. Consider the following scatterplot of the effect sizes against some moderator (ignore the *'s for now):
>>>> 
>>>> |      *  ..   .
>>>> |      *.. . .
>>>> |    . *. .
>>>> |   . .*.
>>>> |  ..  *
>>>> |      *
>>>> +------*--------
>>>> 
>>>> Now suppose all studies where the moderator is below * are missing. This shouldn't bias the slope of the coefficient for the moderator, but studies where the moderator is know will have on average a higher effect size than studies where the moderator is unknown. So what will then the conclusion be once we find this?
>>>> 
>>>> 3) Again, how about both? Make a side-by-side table of the results.
>>>> 
>>>> 4) Yes (on average).
>>>> 
>>>> 5) Yes. If you see a coefficient for "Yes", then "No" is the reference level. So the coefficient for "Yes" tells you how much lower/higher the effect is on average for "Yes" compared to "No".
>>>> 
>>>> Best,
>>>> Wolfgang
>>>> 
>>>>> -----Original Message-----
>>>>> From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-
>>>>> project.org] On Behalf Of Tommy van Steen
>>>>> Sent: Friday, 06 July, 2018 14:37
>>>>> To: r-sig-meta-analysis using r-project.org
>>>>> Subject: [R-meta] Moderator analysis with missing values (Methods and
>>>>> interpretations)
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I’m running a meta-analysis using Cohen’s d in the metafor-package for R.
>>>>> I’m doubting my method/interpretation of results at various stages. As I
>>>>> want to make sure I’m doing it right, rather than doing what is
>>>>> convenient, I hope you could provide me with some advice regarding the
>>>>> following questions:
>>>>> 
>>>>> 1. Heterogeneity is high in my data, and I want to add a list of
>>>>> moderators to test their influence. However, many of these moderators
>>>>> have missing values because not all studies have measured these
>>>>> variables. If I run a model that includes all moderators, the number of
>>>>> comparisons drops from 51 to 27. I’d prefer to include all moderators at
>>>>> once, but is this the right thing to do, or should I test each moderator
>>>>> separately?
>>>>> 2. Following 1: if I can run the model as a whole, is it possible and
>>>>> useful to in some way compare the overall effect size of the studies with
>>>>> no missing moderator data with those that are excluded in the model
>>>>> because of these missing datapoints?
>>>>> 3. Some moderators that are significant when including all moderators at
>>>>> once, are not significant when tested individually on the same subset of
>>>>> 27 studies. Which of the two statistics (as part of the larger model, or
>>>>> the individual moderator) should I report?
>>>>> 
>>>>> And two questions about interpretation:
>>>>> 4. I added publication year as moderator and and the estimate is 0.0360.
>>>>> Am I interpreting this result correctly when I say that every increase in
>>>>> the moderator year by 1, increases the effect size by 0.0360?
>>>>> 5. I also added a dichotomous moderator with options yes/no. In the
>>>>> moderator list, This moderator is listed with the ‘yes’ option, with an
>>>>> estimate of 0.5739, does this mean the effect size is 0.5739 higher than
>>>>> when the moderator value is ‘no’?
>>>>> 
>>>>> Thank you in advance for your thoughts and advice.
>>>>> 
>>>>> Best wishes,
>>>>> Tommy
>> _______________________________________________
>> R-sig-meta-analysis mailing list
>> R-sig-meta-analysis using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> 
> -- 
> Michael
> http://www.dewey.myzen.co.uk/home.html