1. Introduction
The causal mediation analysis is a statistical technique to
investigate and identify relationships in a causal mechanism involving
one or more intermediate variables (i.e., mediators) between an
independent variable and an outcome. In addition to a better
understanding of the causal pathways in proposed theoretical mechanisms,
mediation analyses can help to confirm and refine treatments when it is
not possible or ethical to intervene the independent variable.
However, challenges arise in mediation analyses in datasets with an
excessive number of zero data point for mediators, especially in count
data or non-negative measurements. The standard mediation analysis
approaches may not be valid due to the violation of distributional
assumptions. Moreover, the excessive zero mediator values could contain
both true and false zeros. A true zero means that the measurement is
truly zero, while a false zero means the measurement is positive but
might be too small to be detected given the accuracy of devices used.
Therefore, there is an unmet need for mediation analysis approaches to
account for the zero-inflated structures of these mediators.
To address the difficulties, we proposed a novel mediation analysis
approach to estimate and test direct and indirect effects to handle
zero-inflated mediators that are non-negative. The zero-inflated
log-normal (ZILoN), zero-inflated negative binomial (ZINB), and
zero-inflated Poisson (ZIP) mediators were considered as the possible
options of distributions for these mediators.
The R package MAZE
implements the proposed causal
mediation analysis approach for zero-inflated mediators to estimate and
test natural indirect effect (NIE), natural direct effect (NDE), and
controlled direct effect (CDE). Given the zero-inflated nature, the
mediation effect (i.e., NIE) can be decomposed into two components
NIE\(_1\) and NIE\(_2\).
2. Model
For simplicity, the subject index is suppressed, and confounders are
not included in the equations, but they have been incorporated into
MAZE
.
For an independent variable \(X\), a
zero-inflated mediator \(M\) and a
continuous outcome variable \(Y\), the
following regression equation is used to model the association between
\(Y\) and \((X,M)\): \[\begin{align}
Y_{xm1_{(m>0)}}=\beta_0+\beta_1m+\beta_2
1_{(m>0)}+\beta_3x+\beta_4x1_{(m>0)}+\beta_5xm+\epsilon,
\label{ymodel}
\end{align}\] where \(Y_{xm1_{(m>0)}}\) is the potential
outcome of \(Y\) when \((X, M, 1_{(M>0)})\) take the value of
\((x,m,1_{(m>0)})\), \(1_{(\cdot)}\) is an indicator function.
Equation (\(\ref{ymodel}\)) is an
regression model where \(\beta_0,\beta_1,\beta_2,\beta_3,\beta_4,\beta_5\)
are regression coefficients and \(\epsilon\) is the random error following
the normal distribution \(N(0,\delta^2)\). Notice that interactions
between \(X\) and the two mediators
\(M\) and \(1_{(M>0)}\) can be accommodated by the
product terms \(\beta_4X1_{(M>0)}\)
and \(\beta_5XM\) in the model, which
is an advantage of potential-outcomes mediation analysis approaches.
Users can specify whether to include either one, both, or none of the
two possible interactions using the argument XMint
.
2.2 Probability mechanism for observing false zeros
It is common to observe two types of zeros for \(M\) in a data set with excessive zeros:
true zeros and false zeros. We use \(M\) to denote the true value of the
mediator and use \(M^*\) for the
observed value of \(M\). When the
observed value of the mediator is positive (i.e., \(M^*>0\)), we assume \(M^*=M\). However, when \(M^*=0\), we don’t know whether \(M\) is truly zero or \(M\) is positive but incorrectly observed as
zero. We consider the following mechanism for observing a zero:
\[\begin{equation}\label{zeroMecha}
P(M^*=0|M)=\begin{cases}
\exp(-\eta^2 M), & M\le B\\
0, &M>B
\end{cases},
\end{equation}\] where the parameter \(\eta\) needs to be estimated, and \(B>0\) is a known constant. The value of
\(B\) can be informed on the basis of
the insights and judgements of professionals in the specific field from
which the data arose.
4. Main function MAZE()
To estimate and test NIE, NIE\(_1\),
and NIE\(_2\), NDE, and CDE, the R
function MAZE
is used to implement the proposed mediation
analysis approach for zero-inflated mediators.
4.2 Outputs
A list object containing
results_effects
: a data frame for the results of
estimated effects (NIE1, NIE2, NIE, NDE, and CDE)
results_parameters
: a data frame for the results of
model parameters
selected_model_name
: a string for the distribution
of and number of components selected in the final mediation
model
BIC
: a numeric value for the BIC of the final
mediation model
AIC
: a numeric value for the AIC of the final
mediation model
models
: a list with all fitted models
analysis2_out
: a list with output from analysis2()
function (used for internal check)
5. Example
The MAZE
package contains an example dataset
zinb10
that was generated using the proposed model with a
zero-inflated negative binomial mediator (\(K=1\)). It is a data frame with 100
observations and 3 variables: a continuous independent variable
X
, a continuous outcome Y
, and a count
mediator variable Mobs
. The mediator variable contains 10%
zero values in which half are false zeros.
library(MAZE)
#> Loading required package: flexmix
#> Loading required package: lattice
#> Loading required package: numDeriv
#> Loading required package: pracma
#>
#> Attaching package: 'pracma'
#> The following objects are masked from 'package:numDeriv':
#>
#> grad, hessian, jacobian
# load the example dataset "zinb10"
data(zinb10)
# call MAZE() to perform mediation analysis
maze_out <- MAZE(data = zinb10,
distM = c('zilonm', 'zinbm', 'zipm'), K = 1,
selection = 'AIC',
X = 'X', M = 'Mobs', Y = 'Y', Z = NULL,
XMint = c(TRUE, FALSE),
x1 = 0, x2 = 1, zval = NULL, mval = 0,
B = 20, seed = 1)
## results of selected mediation model
maze_out$results_effects # indirect and direct effects
#> Estimate SE CI_lower CI_upper Pvalue
#> NIE1 0.05848985 0.04004098 -0.01998904 0.1369687 1.440842e-01
#> NIE2 0.03049883 0.02884347 -0.02603334 0.0870310 2.903332e-01
#> NIE 0.08898868 0.05507322 -0.01895285 0.1969302 1.061322e-01
#> NDE 0.78215096 0.10561449 0.57515037 0.9891515 1.303402e-13
#> CDE 0.15330724 0.44685305 -0.72250865 1.0291231 7.315368e-01
maze_out$selected_model_name # distribution of the mediator and number of components K in the selected mediation model
#> [1] "zinbm_K1"
maze_out$results_parameters # model parameters
#> Initials Estimate SE CI_lower CI_upper Pvalue
#> beta0 1.78517270 1.3032717 0.30847936 0.69866330 1.9078802 2.391251e-05
#> beta1 0.12015440 0.1200885 0.04070186 0.04031429 0.1998626 3.173198e-03
#> beta2 0.00000000 0.4823095 0.36486363 -0.23281005 1.1974291 1.862048e-01
#> beta3 0.85104549 0.1533072 0.44685305 -0.72250865 1.0291231 7.315368e-01
#> beta4 0.00000000 0.6979238 0.45876473 -0.20123859 1.5970861 1.281818e-01
#> delta 0.95403971 0.9333399 0.06599795 0.80398635 1.0626935 0.000000e+00
#> alpha10 1.41269766 1.3645683 0.07253324 1.22240580 1.5067309 0.000000e+00
#> alpha11 0.09257633 0.1069317 0.06954884 -0.02938148 0.2432450 1.241695e-01
#> r 5.68623319 6.7790122 3.36654452 0.18070618 13.3773182 4.404723e-02
#> gamma0 -2.21212075 -2.8245948 0.67815014 -4.15374470 -1.4954450 3.111524e-05
#> gamma1 -0.35269450 -0.3554505 0.51607592 -1.36694068 0.6560398 4.909767e-01
#> eta 0.01000000 2.9869217 14.54116089 -25.51322999 31.4870733 8.372506e-01
maze_out$BIC; maze_out$AIC # BIC and AIC of the selected mediation model
#> zinbm_K1
#> 781.7195
#> zinbm_K1
#> 750.4574
Session Info
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MAZE_0.0.2 pracma_2.4.2 numDeriv_2016.8-1.1
#> [4] flexmix_2.3-18 lattice_0.20-45
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 bslib_0.4.2 compiler_4.2.2 jquerylib_0.1.4
#> [5] iterators_1.0.14 tools_4.2.2 digest_0.6.31 jsonlite_1.8.4
#> [9] evaluate_0.19 lifecycle_1.0.3 rlang_1.0.6 foreach_1.5.2
#> [13] cli_3.5.0 rstudioapi_0.14 yaml_2.3.6 parallel_4.2.2
#> [17] xfun_0.36 fastmap_1.1.0 stringr_1.5.0 knitr_1.41
#> [21] vctrs_0.5.1 sass_0.4.4 stats4_4.2.2 grid_4.2.2
#> [25] nnet_7.3-18 glue_1.6.2 R6_2.5.1 rmarkdown_2.19
#> [29] magrittr_2.0.3 codetools_0.2-18 htmltools_0.5.4 modeltools_0.2-23
#> [33] MASS_7.3-58.1 stringi_1.7.8 doParallel_1.0.17 cachem_1.0.6