[R-meta] Dear Wolfgang

Tue Dec 15 23:00:31 CET 2020

Hi JU,

Responses to your follow-up questions below.

James

On Thu, Dec 10, 2020 at 4:44 PM Ju Lee <juhyung2 using stanford.edu> wrote:

> Dear James,
>
> Thank you so much for these detailed thoughts and suggestions. My
> co-authors and I find this input extremely helpful.
>
> Currently, we are analyzing fishery data across the US, trying to
> understand how habitat or environmental contexts influence fish
> productivity by combining both peer-reviewed and agency data. We think
> our current issue is most related to your first point. We have >80
> peer-reviewed papers and 3-4 agency dataset.
>
> One major difference between peer-reviewed articles and agency data is
> that agency data are collected more frequently, across larger areas and
> different sites, and over a longer time frame (> 20 yr long data
> accumulated). Also, they do indiscriminately record all catch, whereas
> peer-reviewed papers often report ones that are more relevant to their
> research question (one that is more aligned with our research question as
> well).
>
> So, after reading your comments, I envision us applying a stricter rule
> for the agency data. We have 1000 or so effect sizes combining many
> peer-reviewed articles, whereas each dataset generates between a whopping
> 9000-13000 effect sizes if we apply the same inclusion criteria.
>
> Our follow-up questions are:
>
> *1. Given that we think there is still some value in including the agency
> data in our analysis, is it reasonable to apply different criteria or rules
> (more strict) just for the agency data?*
>
> *For example, in our peer-reviewed data, we treat samplings conducted in
> different areas and years as independent studies and as these effects are
> matched with discrete measurements for our moderator of interest (say
> temperature or depth). I am wondering if it is justifiable if we apply a
> different protocol just for agency data (ex. merging and pooling across all
> years and sites and just generate a single effect size for each fish
> species OR only including randomly chosen year or site from the dataset)
> for the sake of not taking over the entire data with agency ones.*
>
>
As a statistician who doesn't know anything about the subject-matter, I'm
afraid I don't really feel qualified to answer this question. This requires
making judgements about the relevance for your research questions of the
different types of sites, species, measures, time-points, etc. included in
the agency data and in the peer-reviewed data. I would not recommend using
a different protocol for agency data than for peer reviewed data just for
the sake of shrinking down the dataset. I think the thing to do is focus on
which data are relevant and suitable for answering the questions you have.

> 2. One option we were considering was to run our models with and without
> agency data and report both. However, you pointed out that that model
> output (including such an abnormally large study) may not be reliable at
> all if there is such a huge study size differences to begin with. So, my
> understanding is that this should not be one of our options unless we can
> significantly reduce the number of agency data being incorporated?
>

It doesn't seem unreasonable to me to run your analyses with and without
the agency data, and reporting both sets of results. The problem with this
approach would be figuring out how to interpret everything and draw
bottom-line conclusions if the results aren't consistent. That's why I
suggested running the analyses separately for the peer-reviewed data and
the agency data. It think that would let you get more of a purchase on
what's going on.

> 3. Finally, you mentioned analyzing the agency data separately. I have not
> considered this option but say we have two agency datasets that we combine
> to run our models. There would be plenty of effect sizes (coming from
> different sites, years, observers, and fish species) but only two levels of
> study (data 1, 2). I am unsure if this approach would make any sense, but
> do you have any additional thoughts on the validity of such models or
> approaches?
>

If you model the agency data on its own, I don't think it would make sense
to include study as a random effect. The results would be conditional on
the available agency studies, so you'd have to interpret them accordingly.
But, with so much available data, there would still be many sources of
variation that could be investigated and modeled (as Wolfgang noted in his
reply).

	[[alternative HTML version deleted]]