[R-meta] IPD meta analysis / complex survey design
GOSLING Corentin
corent|n@go@||ng @end|ng |rom gm@||@com
Thu Mar 4 11:29:17 CET 2021
Dear all
I come back to you about the IPD meta-analysis we are conducting to explore
the effect of month of birth on the persistence of ADHD. I had already
asked for your help a few months ago when I was writing the protocol. We
have since completed our systematic review and started to include data from
different cohorts. As the month of birth is sensitive data, we do not ask
the authors to send us the raw data: we have constructed an R-script that
we send to the authors and which performs the analyses automatically and
shares the anonymised results. We then carry out a classic two-stage
meta-analysis based on summary results.
We are facing a new challenge that we did not anticipate. Several studies
involve complex survey design. Some studies have clusters (e.g., twin
cohorts or assessments of several regular siblings per family), while
others have even more complex sampling (and include for example sampling
weights, stratum or finite population correction (fpc)). Some studies
include both (clusters + stratum/weights/fpc).
To analyse the data with clustering, naturally we thought of using mixed
models via the glmer function of lme4 (our VD is binary: ADHD persistence
yes/no). However, lme4 does not allow to handle - for the moment - sampling
weights or stratifications. Therefore, for all data with clustering and/or
weights and/or stratum and/or fpc, our idea was to use only the svyglm
function of the survey package in order to have a coherent group of
analyses (we know that the glmer and svyglm functions do not use the same
coefficients (marginals vs. conditionals)).
Our question is the following: can we group within the same meta-analysis
coefficients that come from standard logistic regressions and coefficients
that come from generalised mixed models fitted using glmer or generalised
linear models adapted to complex designs fitted using svyglm?
To support our question, we performed some tests on a dataset including
clusters and sampling weights. Here are the results :
######################################################################
*On raw dataset* (df_raw is a dataset containing clustering)
*# regular logistic regressions on the raw data (we ignore clustering / we
ignore sampling weights):*
summary(glm(DV ~ IV, family = "binomial", data = df_raw))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07497 0.12907 -8.328 <2e-16 ***
IV (month) 0.03916 0.01732 2.261 0.0238 *
*# generalized mixed model via lme4 (we take into account the clustering
(ID variable) / we ignore sampling weights):*
summary(lme4::glmer(DV ~ IV + (1 | ID), family = "binomial", data =
df_raw))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.10949 0.14571 -7.614 2.65e-14 ***
IV (month) 0.04034 0.01793 2.250 0.0245 *
*# generalized linear model via survey package (we take into account the
clustering (ID variable) / we ignore sampling weights):*
dclus1<- survey::svydesign(id= ~ID, data = df_raw)
summary(survey::svyglm(DV ~ IV, design = dclus1, family = quasibinomial()))
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.07497 0.12927 -8.316 2.31e-16 ***
IV (month) 0.03916 0.01729 2.265 0.0237 *
*# generalized linear model via survey package (we take into account the
clustering (ID variable) / we take into account sampling weights (WEIGHT
variable)):*
dclus2<- survey::svydesign(id=~ID, weights = ~WEIGHT, data = df_raw)
summary(survey::svyglm(DV ~ IV, design = dclus2, family = quasibinomial()))
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.98952 0.15475 -6.394 2.25e-10 ***
IV (month) 0.02195 0.02069 1.061 0.289
######################################################################
*On an aggregated dataset *(df_agg is the same dataset as df_raw but not
containing any clustering: we have randomly selected one child per cluster).
length(unique(df_agg$ID)) is equal to nrow(df_agg)
*# regular logistic regressions on the aggregated data (we ignore sampling
weights):*
summary(glm(DV ~ IV, family = "binomial", data = df_agg))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07309 0.13328 -8.051 8.2e-16 ***
IV (month) 0.04327 0.01782 2.428 0.0152 *
*# generalized mixed model via lme4 (we ignore sampling weights):*
summary(lme4::glmer(DV ~ IV + (1 | ID), family = "binomial", data =
df_agg))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07309 0.13328 -8.051 8.2e-16 ***
IV (month) 0.04327 0.01782 2.428 0.0152 *
*# generalized linear model adapted to complex design via survey (we ignore
sampling weights):*
dclus4<- survey::svydesign(id= ~ID, data = df_agg)
summary(survey::svyglm(DV ~ IV, design = dclus4, family = quasibinomial()))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07309 0.13351 -8.037 2.12e-15 ***
IV (month) 0.04327 0.01785 2.424 0.0155 *
*# generalized linear model adapted to complex design via survey (we take
into account sampling weights):*
dclus5<- survey::svydesign(id= ~ID, weights = WEIGHT, data = df_agg)
summary(survey::svyglm(DV ~ IV, design = dclus5, family = quasibinomial()))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.95961 0.15957 -6.014 2.38e-09 ***
IV (month) 0.02471 0.02133 1.159 0.247
As you can see, the results are almost the same from the models, except
when we take into account sampling weights. I hope that our problem is
clearly exposed
Thank you very much in advance for your help!
Corentin J Gosling
[[alternative HTML version deleted]]
More information about the R-sig-meta-analysis
mailing list