[R-meta] AICc or variance components, which one matters more?

Sat Nov 13 20:11:14 CET 2021

>-----Original Message-----
>From: Luke Martinez [mailto:martinezlukerm using gmail.com]
>Sent: Saturday, 13 November, 2021 18:44
>To: Viechtbauer, Wolfgang (SP)
>Cc: Philippe Tadger; R meta
>Subject: Re: [R-meta] AICc or variance components, which one matters more?
>
>Hi Wolfgang,
>
>I'm fully with you, however, in my data only once 2 labs (labs 1 and
>2) have collaborated on study 2. Specifically, part of study 2 has
>been carried out by lab 1 (one row) and part of it by lab 2 (one row).
>Except in this case, no such between-lab collaborations have ever
>occurred in the data.
>
>If such a between-lab collaboration didn't exist, I could directly go
>for g1 (hierarchical model). But with this collaboration, there is
>just a tiny possibility for g2 (crossed model) as well.
>
>So, do you think AICc should be the basis of the comparison between g1
>vs. g2 or the dominant data structure (ignoring the one exception)?

Using information criteria *could* be the basis. But I might be inclined to just ignore the issue you describe above though if this only affects one study.

Just as a note: It's not necessarily an either-or choice. This model is also possible:

(g3=rma.mv(yi, vi, random = list(~1|lab/study, ~ 1 | study), data = dd))

and profile(g3) suggests that all variance components are identifiable - although of course this is quite overfitted with so little data.

>(g1=rma.mv(yi, vi, random = ~1|lab/study, data = dd))
>(g2=rma.mv(yi, vi, random = list(~1|lab, ~1|study), data = dd))
>
>Thanks,
>Luke
>
>On Sat, Nov 13, 2021 at 9:16 AM Viechtbauer, Wolfgang (SP)
><wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>>
>> >>> I don't know why you think a model is better if the estimates of its
>variance
>> >components are non-zero.
>> >
>> >Because of two things. 1) It maybe more desirable/efficient for a random
>factor
>> >to explain some heterogeneity (as in g1) rather than not (as in g2). 2) The CI
>> >for the variance component (using the full data) is narrower for lab in g1
>> >relative to that in g2.
>>
>> One can debate 1) For example, if I fit a random-effects model and tau^2 ends
>up being estimated to be 0, then I would be happy, because this suggests that the
>true effects are homogeneous, which *greatly* simplifies the conclusions. In
>fact, my estimate of the pooled effect is then *more* efficient than if there was
>heterogeneity.
>>
>> As for 2) To me, that would not be a reason for picking one model over the
>other, at least not in this case where the two models are making fundamentally
>different assumptions about the structure of the data.
>>
>> >>> Personally, I would rather think about the meaning of these models and use
>the
>> >one that makes more sense in the given context.
>> >
>> >Sure, but that's not always as straightforward. For instance, imagine in your
>> >current data, the dominant structure is the hierarchy of A over B with a few
>> >exceptions where this hierarchy can break (meaning A and B can also be
>considered
>> >crossed).
>> >
>> >In such a situation, one may think that if my data was much larger, then the
>few
>> >exceptions in the current data may actually turn out to be the dominant
>structure
>> >(meaning A and B being crossed may be a more representative model of the
>> >population structure of A and B) or may be that's just overthinking.
>> >
>> >So when meaning alone is not that conclusive, you may a bit rely on the
>> >comparison across a set of candidate models. Here, AICc and how much
>> >heterogeneity is explained by each model may at least help you pick the one
>that
>> >more fully describes your current data.
>> >
>> >Curious to know your thoughts?
>>
>> If you want to compare models, then fit indices would be a useful approach (and
>if the models are nested, then LRTs could be used).
>>
>> >All the best,
>> >Luke
>> >
>> >Sat, Nov 13, 2021, 6:56 AM Viechtbauer, Wolfgang (SP)
>> ><wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>> >Please see my comments below.
>> >
>> >Best,
>> >Wolfgang
>> >
>> >>-----Original Message-----
>> >>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org]
>On
>> >>Behalf Of Philippe Tadger
>> >>Sent: Saturday, 13 November, 2021 11:03
>> >>To: Luke Martinez; R meta
>> >>Subject: Re: [R-meta] AICc or variance components, which one matters more?
>> >>
>> >>Dear Luke
>> >>
>> >>I will suggest to choose the model that can provide a reliable
>> >>estimation of the variances. Usually a variance estimation equal to zero
>> >>or a correlation estimation  equal to 1 or -1 are potential warning
>> >>signal, your model is in the limit of the parameter space: so you can
>> >>having an underestimation and/or a non-identifiable model. How can you
>> >>be sure? Using the profile() function in each model. In your case model
>> >>g1 is identifiable, but g2 is not identifiable. I'll choose g1.
>> >
>> >This is not correct. Both variance componments are identifiable in both
>models,
>> >as the profile plots indicate. The estimate of the lab variance component in
>g2
>> >just happens to be 0 (the peak of the likelihood profile is at 0).
>> >
>> >>On 13/11/2021 05:24, Luke Martinez wrote:
>> >>> Hello Colleagues,
>> >>>
>> >>> I've fit two candidate models (g1 and g2).
>> >>>
>> >>> With g1, I get two non-zero variance components for 'lab' and 'study'.
>> >>>
>> >>> With g2, I get a zero variance component for 'lab' and a non-zero one
>> >>> for 'study'.
>> >>>
>> >>> So, from the perspective of variance components, g1 seems like a better
>model.
>> >
>> >I don't know why you think a model is better if the estimates of its variance
>> >components are non-zero.
>> >
>> >>> But when I compare the models using AICc, g2 seems like a better model.
>> >
>> >Personally, I would rather think about the meaning of these models and use the
>> >one that makes more sense in the given context.
>> >
>> >>> I wonder, then, which criterion should I use to choose the model (AICc
>> >>> or the variance components)?
>> >>>
>> >>> Thanks,
>> >>> Luke
>> >>> For reproducibility, I'm showing a somewhat similar situation with a
>> >>> small section of my data below.
>> >>>
>> >>> m="
>> >>>        lab study   yi         vi es_id
>> >>>           1     1 1.04 0.48503768 1
>> >>>           1     1 0.96 0.51076604 2
>> >>>           1     2 1.71 0.05767389 3
>> >>>           2     2 1.52 0.07539841 4
>> >>>           1     3 1.91 0.31349510 5
>> >>>           2     4 3.01 0.67910095 6
>> >>>           2     4 3.62 0.50670360 7
>> >>>           9     5 0.99 0.18297170 8
>> >>>           9     5 0.43 0.37225851 9
>> >>>           9     6 1.68 0.39072390 10
>> >>>           9     6 1.25 0.02879550 11"
>> >>> dd <- read.table(text=m,h=T)
>> >>>
>> >>> (g1=rma.mv(yi, vi, random = ~1|lab/study, data = dd))
>> >>>                      estim    sqrt  nlvls  fixed     factor
>> >>> sigma^2.1  0.1245  0.3529      3     no        lab
>> >>> sigma^2.2  0.2535  0.5035      7     no  lab/study
>> >>>
>> >>> (g2=rma.mv(yi, vi, random = list(~1|lab, ~1|study), data = dd))
>> >>>                       estim    sqrt  nlvls  fixed  factor
>> >>> sigma^2.1  0.0000  0.0000      3     no     lab
>> >>> sigma^2.2  0.5127  0.7160      6     no   study
>> >>>
>> >>> fitstats(g1,g2)[5,]
>> >>>              g1          g2
>> >>> AICc: 30.85992 29.73897
>> >>> (for full data, AICc of g2 is more noticeably smaller than that of g1)