[R-meta] AICc or variance components, which one matters more?

Sat Nov 13 21:09:58 CET 2021

Interesting! To make sure I'm following you, your suggested g3 model
both considers 'study' to be nested in the 'lab', and at the same time
it considers 'study' to have its own independent crossed effect. Can
we consider the same variable (e.g., study) to be both nested and
crossed at the same time?

If so, I can then suggest the following model as well:

(g4=rma.mv(yi, vi, random = list(~1|lab/study, ~ 1 | lab), data = dd))

Doesn't this denote that one is uncertain about whether to take a
variable as nested or crossed or there are other justifications?

Thank you,
Luke

On Sat, Nov 13, 2021 at 1:11 PM Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>
> >-----Original Message-----
> >From: Luke Martinez [mailto:martinezlukerm using gmail.com]
> >Sent: Saturday, 13 November, 2021 18:44
> >To: Viechtbauer, Wolfgang (SP)
> >Cc: Philippe Tadger; R meta
> >Subject: Re: [R-meta] AICc or variance components, which one matters more?
> >
> >Hi Wolfgang,
> >
> >I'm fully with you, however, in my data only once 2 labs (labs 1 and
> >2) have collaborated on study 2. Specifically, part of study 2 has
> >been carried out by lab 1 (one row) and part of it by lab 2 (one row).
> >Except in this case, no such between-lab collaborations have ever
> >occurred in the data.
> >
> >If such a between-lab collaboration didn't exist, I could directly go
> >for g1 (hierarchical model). But with this collaboration, there is
> >just a tiny possibility for g2 (crossed model) as well.
> >
> >So, do you think AICc should be the basis of the comparison between g1
> >vs. g2 or the dominant data structure (ignoring the one exception)?
>
> Using information criteria *could* be the basis. But I might be inclined to just ignore the issue you describe above though if this only affects one study.
>
> Just as a note: It's not necessarily an either-or choice. This model is also possible:
>
> (g3=rma.mv(yi, vi, random = list(~1|lab/study, ~ 1 | study), data = dd))
>
> and profile(g3) suggests that all variance components are identifiable - although of course this is quite overfitted with so little data.
>
> >(g1=rma.mv(yi, vi, random = ~1|lab/study, data = dd))
> >(g2=rma.mv(yi, vi, random = list(~1|lab, ~1|study), data = dd))
> >
> >Thanks,
> >Luke
> >
> >On Sat, Nov 13, 2021 at 9:16 AM Viechtbauer, Wolfgang (SP)
> ><wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> >>
> >> >>> I don't know why you think a model is better if the estimates of its
> >variance
> >> >components are non-zero.
> >> >
> >> >Because of two things. 1) It maybe more desirable/efficient for a random
> >factor
> >> >to explain some heterogeneity (as in g1) rather than not (as in g2). 2) The CI
> >> >for the variance component (using the full data) is narrower for lab in g1
> >> >relative to that in g2.
> >>
> >> One can debate 1) For example, if I fit a random-effects model and tau^2 ends
> >up being estimated to be 0, then I would be happy, because this suggests that the
> >true effects are homogeneous, which *greatly* simplifies the conclusions. In
> >fact, my estimate of the pooled effect is then *more* efficient than if there was
> >heterogeneity.
> >>
> >> As for 2) To me, that would not be a reason for picking one model over the
> >other, at least not in this case where the two models are making fundamentally
> >different assumptions about the structure of the data.
> >>
> >> >>> Personally, I would rather think about the meaning of these models and use
> >the
> >> >one that makes more sense in the given context.
> >> >
> >> >Sure, but that's not always as straightforward. For instance, imagine in your
> >> >current data, the dominant structure is the hierarchy of A over B with a few
> >> >exceptions where this hierarchy can break (meaning A and B can also be
> >considered
> >> >crossed).
> >> >
> >> >In such a situation, one may think that if my data was much larger, then the
> >few
> >> >exceptions in the current data may actually turn out to be the dominant
> >structure
> >> >(meaning A and B being crossed may be a more representative model of the
> >> >population structure of A and B) or may be that's just overthinking.
> >> >
> >> >So when meaning alone is not that conclusive, you may a bit rely on the
> >> >comparison across a set of candidate models. Here, AICc and how much
> >> >heterogeneity is explained by each model may at least help you pick the one
> >that
> >> >more fully describes your current data.
> >> >
> >> >Curious to know your thoughts?
> >>
> >> If you want to compare models, then fit indices would be a useful approach (and
> >if the models are nested, then LRTs could be used).
> >>
> >> >All the best,
> >> >Luke
> >> >
> >> >Sat, Nov 13, 2021, 6:56 AM Viechtbauer, Wolfgang (SP)
> >> ><wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> >> >Please see my comments below.
> >> >
> >> >Best,
> >> >Wolfgang
> >> >
> >> >>-----Original Message-----
> >> >>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org]
> >On
> >> >>Behalf Of Philippe Tadger
> >> >>Sent: Saturday, 13 November, 2021 11:03
> >> >>To: Luke Martinez; R meta
> >> >>Subject: Re: [R-meta] AICc or variance components, which one matters more?
> >> >>
> >> >>Dear Luke
> >> >>
> >> >>I will suggest to choose the model that can provide a reliable
> >> >>estimation of the variances. Usually a variance estimation equal to zero
> >> >>or a correlation estimation  equal to 1 or -1 are potential warning
> >> >>signal, your model is in the limit of the parameter space: so you can
> >> >>having an underestimation and/or a non-identifiable model. How can you
> >> >>be sure? Using the profile() function in each model. In your case model
> >> >>g1 is identifiable, but g2 is not identifiable. I'll choose g1.
> >> >
> >> >This is not correct. Both variance componments are identifiable in both
> >models,
> >> >as the profile plots indicate. The estimate of the lab variance component in
> >g2
> >> >just happens to be 0 (the peak of the likelihood profile is at 0).
> >> >
> >> >>On 13/11/2021 05:24, Luke Martinez wrote:
> >> >>> Hello Colleagues,
> >> >>>
> >> >>> I've fit two candidate models (g1 and g2).
> >> >>>
> >> >>> With g1, I get two non-zero variance components for 'lab' and 'study'.
> >> >>>
> >> >>> With g2, I get a zero variance component for 'lab' and a non-zero one
> >> >>> for 'study'.
> >> >>>
> >> >>> So, from the perspective of variance components, g1 seems like a better
> >model.
> >> >
> >> >I don't know why you think a model is better if the estimates of its variance
> >> >components are non-zero.
> >> >
> >> >>> But when I compare the models using AICc, g2 seems like a better model.
> >> >
> >> >Personally, I would rather think about the meaning of these models and use the
> >> >one that makes more sense in the given context.
> >> >
> >> >>> I wonder, then, which criterion should I use to choose the model (AICc
> >> >>> or the variance components)?
> >> >>>
> >> >>> Thanks,
> >> >>> Luke
> >> >>> For reproducibility, I'm showing a somewhat similar situation with a
> >> >>> small section of my data below.
> >> >>>
> >> >>> m="
> >> >>>        lab study   yi         vi es_id
> >> >>>           1     1 1.04 0.48503768 1
> >> >>>           1     1 0.96 0.51076604 2
> >> >>>           1     2 1.71 0.05767389 3
> >> >>>           2     2 1.52 0.07539841 4
> >> >>>           1     3 1.91 0.31349510 5
> >> >>>           2     4 3.01 0.67910095 6
> >> >>>           2     4 3.62 0.50670360 7
> >> >>>           9     5 0.99 0.18297170 8
> >> >>>           9     5 0.43 0.37225851 9
> >> >>>           9     6 1.68 0.39072390 10
> >> >>>           9     6 1.25 0.02879550 11"
> >> >>> dd <- read.table(text=m,h=T)
> >> >>>
> >> >>> (g1=rma.mv(yi, vi, random = ~1|lab/study, data = dd))
> >> >>>                      estim    sqrt  nlvls  fixed     factor
> >> >>> sigma^2.1  0.1245  0.3529      3     no        lab
> >> >>> sigma^2.2  0.2535  0.5035      7     no  lab/study
> >> >>>
> >> >>> (g2=rma.mv(yi, vi, random = list(~1|lab, ~1|study), data = dd))
> >> >>>                       estim    sqrt  nlvls  fixed  factor
> >> >>> sigma^2.1  0.0000  0.0000      3     no     lab
> >> >>> sigma^2.2  0.5127  0.7160      6     no   study
> >> >>>
> >> >>> fitstats(g1,g2)[5,]
> >> >>>              g1          g2
> >> >>> AICc: 30.85992 29.73897
> >> >>> (for full data, AICc of g2 is more noticeably smaller than that of g1)