[R-meta] sandwhich estimator with geeglm object

Yefeng Yang ye|eng@y@ng1 @end|ng |rom un@w@edu@@u
Wed May 3 13:16:53 CEST 2023

Dear both,

Comment 1 from Wolfgang is exactly what I want to ask about within-study correlation.

The reason why I ask about the GEE and the within-study correlation is that I did not find a solution for dependent 2×2 table data (or more generally,  non-normal outcome with correlated error). metafor::rma.glmm works only for independent data. I am thinking whether we can use weighted GEE to resolve this issue. I acknolwdge the beta coefficient from GEE is different from GLMM (the former is a marginal model giving a population-average estimate and the latter is a conditional model giving a subjective-specific estimate). Using GEE + small sample adjustment or GLM + small sample RVE seems the perfect solution for this. Any comment on this?

From: Viechtbauer, Wolfgang (NP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
Sent: Wednesday, 3 May 2023 17:34
To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>; Yefeng Yang <yefeng.yang1 using unsw.edu.au>
Cc: James Pustejovsky <jepusto using gmail.com>
Subject: RE: [R-meta] sandwhich estimator with geeglm object

Just a few comments from my side:

1) While rma.mv() is not designed to estimate a parameterized V matrix, one can easily do this manually. I think this is what Yefeng meant with "iterative way to find the maximum likelihood". For example, if we assume a constant correlation between the sampling errors within studies, one can just repeatedly construct the corresponding V matrix (e.g., with vcalc()), fit the model with rma.mv(), extract the log likelihood (with logLik()), and find the value of the correlation that maximizes the log likelihood.

A special case of this is the so-called 'Riley model' where the within- and between-study correlations are assumed to be identical. This can be fitted in the same manner, since rma.mv() also allows fixing variance/correlation components (so in the iterative steps above, the same correlation is used to construct V and in rma.mv(..., rho=<value>).

Indeed, I have played around with this for a couple examples. When allowing the within- and between-study correlations to differ, then one often runs into the issue that one of the two correlations wants to drift towards a bound (+-1). With enough data, it is possible though to fit such a model.

2) As for using moment estimators versus (RE)ML: Likelihood-based methods are often quite robust to violations of the distributional assumptions, especially when it comes to estimating the fixed effects and their SEs. For constructing prediction intervals, distributional assumptions are more crucial, but this is a step after the model fitting and will affect both methods equally unless one uses non-parametric methods for constructing such intervals.


>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On
>Behalf Of James Pustejovsky via R-sig-meta-analysis
>Sent: Tuesday, 02 May, 2023 21:25
>To: Yefeng Yang
>Cc: James Pustejovsky; R Special Interest Group for Meta-Analysis
>Subject: Re: [R-meta] sandwhich estimator with geeglm object
>Responses below.
>On Mon, May 1, 2023 at 10:27 PM Yefeng Yang <yefeng.yang1 using unsw.edu.au>
>> BTW, what is your opinion of using this interactive way (moment estimator)
>> to estimate within-group correlations (e.g., auto-correlation terms). I
>> mean likelihood approach won't work if the distribution is misspecified
>> (because we do not know the within-group correlations and we need to
>> estimate it in geepack::geeglm()). This is a sort of dilemma.
>I don't have any strong prior. I would guess that the performance of these
>variance component estimators (or more generally, dependence structure
>estimators) could be rather specific to the context you're looking at. For
>instance, it might depend on whether the data are balanced repeated
>measures or imbalance with unequal numbers of repeated measures, on the
>number of independent clusters, etc. Hard to say. Doing some monte carlo
>simulations could help sort out what method to use.
>> If we put this in the context of meta-analysis, metafor::rma.mv does not
>> allow for estimating within-group correlations from data, although it can
>> by using an iterative way to find the maximum likelihood. What do you think?
>If I follow what you mean, I think you're talking about trying to estimate
>rho (the correlation between sampling errors) instead of just imputing a
>value of rho.  If I recall, Wolfgang and I have both toyed with this idea a
>bit. It's feasible to do this using nlme::lme(). The problem is that the
>REML estimation method in nlme when you have a fixed variance structure at
>level 1 is still a bit suspect---might have a bug in it, since it doesn't
>agree with metafor::rma.mv().
>> Cheers,
>> Yefeng
>> ------------------------------
>> *From:* James Pustejovsky <jepusto using gmail.com>
>> *Sent:* Tuesday, 2 May 2023 12:39
>> *To:* Yefeng Yang <yefeng.yang1 using unsw.edu.au>
>> *Cc:* R Special Interest Group for Meta-Analysis <
>> r-sig-meta-analysis using r-project.org>
>> *Subject:* Re: [R-meta] sandwhich estimator with geeglm object
>> Hi Yefeng,
>> Yes, this is correct if we're speaking on a very broad level. Broadly,
>> geepack::geeglm() with an identity link function fits the same type of
>> model as nlme::gls(), but geepack::geeglm() uses sandwich variance
>> estimation whereas nlme::gls() uses model-based variance estimation.
>> Running nlme::gls() with clubSandwich::vcovCR() will give you sandwich
>> variance estimation. The CR1 and CR3 sandwich estimators can be calculated
>> from both packages for comparison purposes.
>> There is one other difference that should be noted though:
>> geepack::geeglm() uses moment estimators for the parameters of the working
>> model (such as variance coefficients, auto-correlation terms, etc.). In
>> contrast, nlme::gls() uses maximum likelihood or restricted maximum
>> likelihood estimation for these parameters. This difference in estimation
>> strategies can lead to numerical differences in the model coefficient
>> estimates calculated by the two packages  (and therefore also differences
>> in the sandwich standard errors generated by geepack::geeglm versus by
>> clubSandwich::vcovCR.gls).
>> James
>> On Mon, May 1, 2023 at 7:19 PM Yefeng Yang <yefeng.yang1 using unsw.edu.au>
>> wrote:
>> Dear James,
>> Probably, it is more clear if I ask "whether glm + RVE is equal to gee?".
>> gee is doing RVE on the fly, while glm + RVE uses a post-hoc way to
>> calculate robust errors when dealing with correlated errors (e.g., repeated
>> measurements).
>> Best,
>> Yefeng

	[[alternative HTML version deleted]]

More information about the R-sig-meta-analysis mailing list