Title: | Parametric G-Formula |
Version: | 1.1.1 |
Description: | Implements the non-iterative conditional expectation (NICE) algorithm of the g-formula algorithm (Robins (1986) <doi:10.1016/0270-0255(86)90088-6>, Hernán and Robins (2024, ISBN:9781420076165)). The g-formula can estimate an outcome's counterfactual mean or risk under hypothetical treatment strategies (interventions) when there is sufficient information on time-varying treatments and confounders. This package can be used for discrete or continuous time-varying treatments and for failure time outcomes or continuous/binary end of follow-up outcomes. The package can handle a random measurement/visit process and a priori knowledge of the data structure, as well as censoring (e.g., by loss to follow-up) and two options for handling competing events for failure time outcomes. Interventions can be flexibly specified, both as interventions on a single treatment or as joint interventions on multiple treatments. See McGrath et al. (2020) <doi:10.1016/j.patter.2020.100008> for a guide on how to use the package. |
Depends: | R (≥ 3.5.0) |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | data.table, ggplot2, ggpubr, grDevices, nnet, parallel, progress, stats, stringr, survival, truncnorm, truncreg, utils |
Suggests: | Hmisc, knitr, randomForest, rmarkdown, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
URL: | https://github.com/CausalInference/gfoRmula, https://doi.org/10.1016/j.patter.2020.100008 |
BugReports: | https://github.com/CausalInference/gfoRmula/issues |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-03-24 01:37:49 UTC; Sean |
Author: | Victoria Lin [aut] (V. Lin and S. McGrath made equal contributions),
Sean McGrath |
Maintainer: | Sean McGrath <sean.mcgrath514@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-03-24 08:50:02 UTC |
Example Dataset for a Survival Outcome with Censoring
Description
A dataset consisting of 11,332 observations on 2,500 individuals over 7 time points. Each row in the dataset corresponds to the record of one individual at one time point. Individuals who are censored at time k+1
only have a total of k+1
records, which correspond to time indices 0,..., k
.
Usage
basicdata
Format
A data table with 11,332 rows and 8 variables:
- t0
Time index.
- id
Unique identifier for each individual.
- L1
Binary time-varying covariate.
- L2
Continuous time-varying covariate.
- L3
Continuous baseline covariate. For each individual, the baseline values are repeated at each time point.
- A
Binary treatment variable.
- D
Competing event; time-varying indicator of failure.
- Y
Outcome of interest; time-varying indicator of failure.
Example Dataset for a Survival Outcome without Censoring
Description
A dataset consisting of 13,170 observations on 2,500 individuals over 7 time points. Each row in the dataset corresponds to the record of one individual at one time point.
Usage
basicdata_nocomp
Format
A data table with 13,170 rows and 7 variables:
- t0
Time index.
- id
Unique identifier for each individual.
- L1
Binary covariate.
- L2
Continuous covariate.
- L3
Continuous baseline covariate. For each individual, the baseline values are repeated at each time point.
- A
Binary treatment variable.
- Y
Outcome of interest; time-varying indicator of failure.
Example Dataset for a Binary Outcome at End of Follow-Up
Description
A dataset consisting of 17,500 observations on 2,500 individuals over 7 time points. Each row in the dataset corresponds to the record of one individual at one time point.
Usage
binary_eofdata
Format
A data table with 17,500 rows and 7 variables:
- time
Time index.
- id_num
Unique identifier for each individual.
- cov1
Binary time-varying covariate.
- cov2
Continuous time-varying covariate.
- cov3
Continuous baseline covariate. For each individual, the baseline values are repeated at each time point.
- treat
Binary treatment variable.
- outcome
Binary outcome of interest. Because this outcome is only defined at the end of follow-up, values of
NA
are given in all other time points.
Bootstrap Observed Data and Simulate Under All Interventions
Description
This internal function bootstraps the observed data (i.e., resamples the observed data set with replacement to construct bootstrap confidence intervals and standard errors). Then, the function simulates data using the resampled dataset to estimate the survival outcome, binary end-of-follow-up outcome, or continuous end-of-follow-up outcome.
Usage
bootstrap_helper(
r,
time_points,
obs_data,
bootseeds,
outcome_type,
intvars,
interventions,
int_times,
ref_int,
covparams,
covnames,
covtypes,
covfits_custom,
covpredict_custom,
basecovs,
histvars,
histvals,
histories,
ymodel,
ymodel_fit_custom,
ymodel_predict_custom,
yrestrictions,
compevent_restrictions,
restrictions,
comprisk,
compevent_model,
time_name,
outcome_name,
compevent_name,
ranges,
parallel,
ncores,
max_visits,
hazardratio,
intcomp,
boot_diag,
nsimul,
baselags,
below_zero_indicator,
min_time,
show_progress,
pb,
int_visit_type,
sim_trunc,
...
)
Arguments
r |
Integer specifying the index of the current iteration of the bootstrap. |
time_points |
Number of time points to simulate. |
obs_data |
Data table containing the observed data. |
bootseeds |
Vector of integers specifying the seeds. One seed is used to initialize each bootstrap iteration. |
outcome_type |
Character string specifying the "type" of the outcome. The possible "types" are: |
intvars |
List, whose elements are vectors of character strings. The kth vector in |
interventions |
List, whose elements are lists of vectors. Each list in |
int_times |
List, whose elements are lists of vectors. The kth list in |
ref_int |
Integer denoting the intervention to be used as the
reference for calculating the risk ratio and risk difference. 0 denotes the
natural course, while subsequent integers denote user-specified
interventions in the order that they are
named in |
covparams |
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
covfits_custom |
Vector containing custom fit functions for time-varying covariates that
do not fall within the pre-defined covariate types. It should be in
the same order |
covpredict_custom |
Vector containing custom prediction functions for time-varying
covariates that do not fall within the pre-defined covariate types.
It should be in the same order as |
basecovs |
Vector of character strings specifying the names of baseline covariates in |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histvals |
List of length two. The first element is a numeric vector specifying the lags used in the model statements (e.g., if |
histories |
Vector of history functions to apply to the variables specified in |
ymodel |
Model statement for the outcome variable. |
ymodel_fit_custom |
Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details. |
ymodel_predict_custom |
Function obtaining predictions from the custom outcome model specified in |
yrestrictions |
List of vectors. Each vector containins as its first entry
a condition and its second entry an integer. When the
condition is |
compevent_restrictions |
List of vectors. Each vector containins as its first entry
a condition and its second entry an integer. When the
condition is |
restrictions |
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be |
comprisk |
Logical scalar indicating the presence of a competing event. |
compevent_model |
Model statement for the competing event variable. |
time_name |
Character string specifying the name of the time variable in |
outcome_name |
Character string specifying the name of the outcome variable in |
compevent_name |
Character string specifying the name of the competing event variable in |
ranges |
List of vectors. Each vector contains the minimum and
maximum values of one of the covariates in |
parallel |
Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores. |
ncores |
Integer specifying the number of cores to use in parallel simulation. |
max_visits |
A vector of one or more values denoting the maximum number of times
a binary covariate representing a visit process may be missed before
the individual is censored from the data (in the observed data) or
a visit is forced (in the simulated data). Multiple values exist in the
vector when the modeling of more than covariate is attached to a visit
process. A value of |
hazardratio |
Logical scalar indicating whether the hazard ratio should be computed between two interventions. |
intcomp |
List of two numbers indicating a pair of interventions to be compared by a hazard ratio.
The default is |
boot_diag |
Logical scalar indicating whether to return the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is |
nsimul |
Number of subjects for whom to simulate data. By default, this argument is set
equal to the number of subjects in |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
below_zero_indicator |
Logical scalar indicating whether the observed data set contains rows for time |
min_time |
Numeric scalar specifying lowest value of time |
show_progress |
Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when |
pb |
Progress bar R6 object. See |
int_visit_type |
Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in |
sim_trunc |
Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type |
... |
Other arguments |
Value
A list with the following components:
Result |
Matrix containing risks over time under the natural course and under each user-specific intervention. |
ResultRatio |
Matrix containing risk ratios over time under the natural course and under each user-specific intervention. |
ResultDiff |
Matrix containing risk differences over time under the natural course and under each user-specific intervention. |
bootcoeffs |
List of the coefficients of the fitted models. If the argument |
bootstderrs |
List of the standard errors of the coefficients of the fitted models. If the argument |
Carry Forward
Description
This function assists the implemention of a restriction on a covariate in the date
table newdf
. A particular covariate is simulated only when some condition
(usually a covariate representing whether a doctor's visit occurred or not) is TRUE
.
If the condition is FALSE
, the covariate value is not simulated for that time point
and the value is instead carried over from the previous time point.
Usage
carry_forward(newdf, pool, restriction, time_name, t, int_visit_type, intvar)
Arguments
newdf |
Data table containing the simulated data at time |
pool |
Data table containing the simulated data at times before |
restriction |
List of vectors. Each vector contains as its first entry
the covariate affected by the restriction; its second entry
the condition that must be |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
int_visit_type |
Logical scalar specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type |
intvar |
A vector specifying the name(s) of the variable(s) to be intervened on. |
Value
No value is returned. The data table newdf
is modified in place.
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
# At t0 == 5, assign L1 its value at the previous time point
restrictions <- list(c('L2', 't0 != 5', carry_forward))
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
restrictions = restrictions,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
Example Dataset for a Survival Outcome with an Indicator of Censoring Variable
Description
A dataset consisting of 118,725 observations on 20,000 individuals over 10 time points. Each row in the dataset corresponds to the record of one individual at one time point.
Usage
censor_data
Format
A data table with 22,500 rows and 7 variables:
- t0
Time index.
- id
Unique identifier for each individual.
- L
Binary time-varying covariate.
- A
Continuous treatment variable.
- C
Censoring indicator.
- Y
Outcome of interest; time-varying indicator of failure.
Coefficient method for objects of class "gformula"
Description
This function extracts the coefficients of the fitted models for the time-varying covariates, outcome, and compevent event (if applicable).
Usage
## S3 method for class 'gformula'
coef(object, ...)
Arguments
object |
Object of class "gformula". |
... |
Other arguments. |
Value
If bootdiag
was set to FALSE
in gformula
,
this function returns a list of the coefficients of the fitted models to the
observed data set. If bootstrapping was used and bootdiag
was set to TRUE
in
gformula
, this function returns a list described as follows.
The first element (named 'Original sample') is a list of the coefficients of
the fitted models to the observed data set. The kth element (named 'Bootstrap
sample k-1') is a list of the coefficients of the fitted models corresponding
to the k-1th bootstrap sample.
See Also
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
coef(gform_basic)
Example Dataset for a Continuous Outcome at End of Follow-Up
Description
A dataset consisting of 17,500 observations on 2,500 individuals over 7 time points. Each row in the dataset corresponds to the record of one individual at one time point.
Usage
continuous_eofdata
Format
A data table with 17,500 rows and 7 variables:
- t0
Time index.
- id
Unique identifier for each individual.
- L1
Categorical time-varying covariate.
- L2
Continuous time-varying covariate.
- L3
Continuous baseline covariate. For each individual, the baseline values are repeated at each time point.
- A
Binary treatment variable.
- Y
Continuous outcome of interest. Because this outcome is only defined at the end of follow-up, values of
NA
are given in all other time points.
Example Dataset for a Continuous Outcome at End of Follow-Up with Pre-Baseline Times
Description
A dataset consisting of 22,500 observations on 2,500 individuals over 2 pre-baseline time points and follow-up 7 time points. Each row in the dataset corresponds to the record of one individual at one time point.
Usage
continuous_eofdata_pb
Format
A data table with 22,500 rows and 7 variables:
- t0
Time index.
- id
Unique identifier for each individual.
- L1
Categorical time-varying covariate.
- L2
Continuous time-varying covariate.
- L3
Continuous baseline covariate. For each individual, the baseline values are repeated at each time point.
- A
Binary treatment variable.
- Y
Continuous outcome of interest. Because this outcome is only defined at the end of follow-up, values of
NA
are given in all other time points.
General Error Catching
Description
This internal function catches various potential errors in the user input in gformula_survival
, gformula_continuous_eof
, and gformula_binary_eof
.
Usage
error_catch(
id,
nsimul,
intvars,
interventions,
int_times,
int_descript,
covnames,
covtypes,
basecovs,
histvars,
histories,
compevent_model,
hazardratio,
intcomp,
time_points,
outcome_type,
time_name,
obs_data,
parallel,
ncores,
nsamples,
sim_data_b,
outcome_name,
compevent_name,
comprisk,
censor,
censor_name,
covmodels,
histvals,
ipw_cutoff_quantile,
ipw_cutoff_value,
old_convention
)
Arguments
id |
Character string specifying the name of the ID variable in |
nsimul |
Number of subjects for whom to simulate data. |
intvars |
List, whose elements are vectors of character strings. The kth vector in |
interventions |
List, whose elements are lists of vectors. Each list in |
int_times |
List, whose elements are lists of vectors. The kth list in |
int_descript |
Vector of character strings, each describing an intervention. |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
basecovs |
Vector of character strings specifying the names of baseline covariates in |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histories |
Vector of history functions to apply to the variables specified in |
compevent_model |
Model statement for the competing event variable. |
hazardratio |
Logical scalar indicating whether the hazard ratio should be calculated between two interventions. |
intcomp |
List of two numbers indicating a pair of interventions to be compared by a hazard ratio.
The default is |
time_points |
Number of time points to simulate. |
outcome_type |
Character string specifying the "type" of the outcome. The possible "types" are: |
time_name |
Character string specifying the name of the time variable in |
obs_data |
Data table containing the observed data. |
parallel |
Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores. |
ncores |
Integer specifying the number of cores to use in parallel simulation. |
nsamples |
Integer specifying the number of bootstrap samples to generate. |
sim_data_b |
Logical scalar indicating whether to return the simulated data set. If bootstrap samples are used (i.e., |
outcome_name |
Character string specifying the name of the outcome variable in |
compevent_name |
Character string specifying the name of the competing event variable in |
comprisk |
Logical scalar indicating the presence of a competing event. |
censor |
Logical scalar indicating the presence of a censoring variable in |
censor_name |
Character string specifying the name of the censoring variable in |
covmodels |
Vector of model statements for the time-varying covariates. |
histvals |
List of length 3. First element contains a vector of integers specifying the number of lags back for the lagged function. Second element contains a vector of integers indicating the number of lags back for the lagavg function. The last element is an indicator whether a cumavg term appears in any of the model statements. |
ipw_cutoff_quantile |
Percentile by which to truncate inverse probability weights. |
ipw_cutoff_value |
Cutoff value by which to truncate inverse probability weights. |
old_convention |
Logical scalar indicating whether the "old" intervention convention was used (i.e., by specifying |
Value
No value is returned.
Fit Bounded Normal Model on Covariate
Description
This internal function models a covariate using a "bounded normal" distribution by first standardizing the covariate values to the range [0, 1], noninclusive, then fitting a generalized linear model (GLM) under the Gaussian family function.
Usage
fit_bounded_continuous(
covparams,
covlink = NA,
covname,
obs_data,
j,
model_fits
)
Arguments
covparams |
List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). |
covlink |
Vector of link functions. |
covname |
Name of the covariate at index |
obs_data |
Data on which the model is fit. |
j |
Integer specifying the index of the covariate. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
Value
Fitted model for the covariate at index j
.
Fit GLM on Covariate
Description
This internal function fits a generalized linear model (GLM) for a single covariate using the observed data.
Usage
fit_glm(covparams, covlink = NA, covfam, obs_data, j, model_fits)
Arguments
covparams |
List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). |
covlink |
Vector of link functions. |
covfam |
Vector of character strings specifying the names of the family functions to be used for fitting the GLM. |
obs_data |
Data on which the model is fit. |
j |
Integer specifying the index of the covariate. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
Value
Fitted model for the covariate at index j
.
Fit Multinomial Model on Covariate
Description
This internal function fits a multinomial regression model for a categorical covariate using the observed data.
Usage
fit_multinomial(covparams, obs_data, j, model_fits)
Arguments
covparams |
List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). |
obs_data |
Data on which the model is fit. |
j |
Integer specifying the index of the covariate. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
Value
Fitted model for the covariate at index j
.
Fit Truncated Normal Model on Covariate
Description
This internal function models a covariate using a normal distribution truncated on one side at a user-specified cutoff.
Usage
fit_trunc_normal(covparams, obs_data, j, model_fits)
Arguments
covparams |
List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). |
obs_data |
Data on which the model is fit. |
j |
Integer specifying the index of the covariate. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
Value
Fitted model for the covariate at index j
.
Fit Zero-Inflated Normal Model on Covariate
Description
This internal function models a zero-inflated normal distribution through the combined use of a generalized linear model (GLM) fit on a zero vs. non-zero indicator and a GLM fit on all non-zero values.
Usage
fit_zeroinfl_normal(covparams, covlink = NA, covname, obs_data, j, model_fits)
Arguments
covparams |
List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). |
covlink |
Vector of link functions. |
covname |
Name of the covariate at index |
obs_data |
Data on which the model is fit. |
j |
Integer specifying the index of the covariate. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
Value
Fitted model for the covariate at index j
.
Get Covariate Plots
Description
This internal function obtains the covariate plots for gformula_survival
, gformula_continuous_eof
, and gformula_binary_eof
.
Usage
get_cvgrphs(x, covnames, covtypes, xlab, ylab_cov)
Arguments
x |
Object of class "gformula_survival", "gformula_continuous_eof", or "gformula_binary_eof". |
covnames |
Vector of character strings specifying the names of the time-varying covariates to be plotted. The ordering of covariates given here is used in the plot grid. |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
xlab |
Character string for the x axes of all plots. |
ylab_cov |
Vector of character strings for the y axes of the plots for the covariates. This argument must be the same length as |
Value
A list of covariate plots.
Get Risk and Survival Plots
Description
This internal function obtains the risk and survival plots for gformula_survival
.
Usage
get_outgrphs(x, risk, survival, xlab, ylab_risk, ylab_surv, ci_risk)
Arguments
x |
Object of class "gformula_survival". |
risk |
Logical scalar indicating whether to include a plot for the risk. |
survival |
Logical scalar indicating whether to include a plot for the survival. |
xlab |
Character string for the x axes of all plots. |
ylab_risk |
Character string for the y axis of the plot for the risk (if applicable). |
ylab_surv |
Character string for the y axis of the plot for the survival (if applicable). |
ci_risk |
Logical scalar specifying whether to include error bars for the 95% confidence intervals of the estimated risk under the natural course. This argument is only effective if the argument |
Value
A list of plots for the risk and survival.
Get Plotting Information
Description
This internal function obtains the data tables necessary for plotting. For continuous and binary covariates, the mean observed and simulated values are obtained for each time point. For categorical covariates, the observed and simulated means of the levels of the factors are obtained for each time point. When the outcome is of type "survival"
, the observed and simulated risk and survival are obtained for each time point.
Usage
get_plot_info(
outcome_name,
compevent_name,
compevent2_name,
censor_name,
time_name,
id,
time_points,
covnames,
covtypes,
nat_pool,
nat_result,
comprisk,
comprisk2,
censor,
fitD2,
fitC,
outcome_type,
obs_data,
ipw_cutoff_quantile,
ipw_cutoff_value
)
Arguments
outcome_name |
Character string specifying the name of the outcome variable in |
compevent_name |
Character string specifying the name of the competing event variable in |
compevent2_name |
Character string specifying the name of the competing event variable in |
censor_name |
Character string specifying the name of the censoring variable in |
time_name |
Character string specifying the name of the time variable in |
id |
Character string specifying the name of the ID variable in |
time_points |
Number of time points to simulate. |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
nat_pool |
Pooled-over-time data table containing simulated data under the natural course. |
nat_result |
Vector containing the mean outcome over all subjects at each time for natural course. |
comprisk |
Logical scalar indicating the presence of a competing event. |
comprisk2 |
Logical scalar indicating whether competing events are treated as censoring events. |
censor |
Logical scalar indicating the presence of a censoring variable in |
fitD2 |
Model fit for the competing event variable if competing events are treated as censoring events. |
fitC |
Model fit for the censoring variable. |
outcome_type |
Character string specifying the "type" of the outcome. The possible "types" are: |
obs_data |
Data table containing observed data. |
ipw_cutoff_quantile |
Percentile by which to truncate inverse probability weights. |
ipw_cutoff_value |
Cutoff value by which to truncate inverse probability weights. |
Value
A list with the following components:
obs_results |
A list of the mean observed values at each time point for covariates and - if the outcome is of type |
dt_cov_plot |
A list of data tables The data tables contain the observed and simulated mean values of the covariates under each time point. |
dt_obs_plot |
For outcomes of type |
Estimation of Survival Outcome, Continuous End-of-Follow-Up Outcome, or Binary End-of-Follow-Up Outcome Under the Parametric G-Formula
Description
Based on an observed data set, this function estimates the risk over time (for survival outcomes), outcome mean at end-of-follow-up (for continuous end-of-follow-up outcomes), or outcome probability at end-of-follow-up (for binary end-of-follow-up outcomes) under multiple user-specified interventions using the parametric g-formula. See McGrath et al. (2020) for further details concerning the application and implementation of the parametric g-formula.
Usage
gformula(
obs_data,
id,
time_points = NULL,
time_name,
covnames,
covtypes,
covparams,
covfits_custom = NA,
covpredict_custom = NA,
histvars = NULL,
histories = NA,
basecovs = NA,
outcome_name,
outcome_type,
ymodel,
ymodel_fit_custom = NULL,
ymodel_predict_custom = NULL,
compevent_name = NULL,
compevent_model = NA,
compevent_cens = FALSE,
censor_name = NULL,
censor_model = NA,
intvars = NULL,
interventions = NULL,
int_times = NULL,
int_descript = NULL,
ref_int = 0,
intcomp = NA,
visitprocess = NA,
restrictions = NA,
yrestrictions = NA,
compevent_restrictions = NA,
baselags = FALSE,
nsimul = NA,
sim_data_b = FALSE,
seed,
nsamples = 0,
parallel = FALSE,
ncores = NA,
ci_method = "percentile",
threads,
model_fits = FALSE,
boot_diag = FALSE,
show_progress = TRUE,
ipw_cutoff_quantile = NULL,
ipw_cutoff_value = NULL,
int_visit_type = NULL,
sim_trunc = TRUE,
...
)
Arguments
obs_data |
Data table containing the observed data. |
id |
Character string specifying the name of the ID variable in |
time_points |
Number of time points to simulate. By default, this argument is set equal to the maximum
number of records that |
time_name |
Character string specifying the name of the time variable in |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
covparams |
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as |
covfits_custom |
Vector containing custom fit functions for time-varying covariates that
do not fall within the pre-defined covariate types. It should be in
the same order |
covpredict_custom |
Vector containing custom prediction functions for time-varying
covariates that do not fall within the pre-defined covariate types.
It should be in the same order as |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histories |
Vector of history functions to apply to the variables specified in |
basecovs |
Vector of character strings specifying the names of baseline covariates in |
outcome_name |
Character string specifying the name of the outcome variable in |
outcome_type |
Character string specifying the "type" of outcome. The possible "types" are: |
ymodel |
Model statement for the outcome variable. |
ymodel_fit_custom |
Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details. The default is |
ymodel_predict_custom |
Function obtaining predictions from the custom outcome model specified in |
compevent_name |
Character string specifying the name of the competing event variable in |
compevent_model |
Model statement for the competing event variable. The default is |
compevent_cens |
Logical scalar indicating whether to treat competing events as censoring events.
This argument is only applicable for survival outcomes and when a competing even model is supplied (i.e., |
censor_name |
Character string specifying the name of the censoring variable in |
censor_model |
Model statement for the censoring variable. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data. See "Details". |
intvars |
(Deprecated. See the |
interventions |
(Deprecated. See the |
int_times |
(Deprecated. See the |
int_descript |
Vector of character strings, each describing an intervention. It must
be in same order as the specified interventions (see the |
ref_int |
Integer denoting the intervention to be used as the
reference for calculating the risk ratio and risk difference. 0 denotes the
natural course, while subsequent integers denote user-specified
interventions in the order that they are
named (see the |
intcomp |
List of two numbers indicating a pair of interventions to be compared by a hazard ratio.
The default is |
visitprocess |
List of vectors. Each vector contains as its first entry
the covariate name of a visit process; its second entry
the name of a covariate whose modeling depends on the
visit process; and its third entry the maximum number
of consecutive visits that can be missed before an
individual is censored. The default is |
restrictions |
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be |
yrestrictions |
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is |
compevent_restrictions |
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
nsimul |
Number of subjects for whom to simulate data. By default, this argument is set
equal to the number of subjects in |
sim_data_b |
Logical scalar indicating whether to return the simulated data set. If bootstrap samples are used (i.e., |
seed |
Starting seed for simulations and bootstrapping. |
nsamples |
Integer specifying the number of bootstrap samples to generate. The default is 0. |
parallel |
Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores. |
ncores |
Integer specifying the number of CPU cores to use in parallel
simulation. This argument is required when parallel is set to |
ci_method |
Character string specifying the method for calculating the bootstrap 95% confidence intervals, if applicable. The options are |
threads |
Integer specifying the number of threads to be used in |
model_fits |
Logical scalar indicating whether to return the fitted models. Note that if this argument is set to |
boot_diag |
Logical scalar indicating whether to return the parametric g-formula estimates as well as the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is |
show_progress |
Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when |
ipw_cutoff_quantile |
Percentile by which to truncate inverse probability weights. The default is |
ipw_cutoff_value |
Cutoff value by which to truncate inverse probability weights. The default is |
int_visit_type |
Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in |
sim_trunc |
Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type |
... |
Other arguments, including (a) those that specify the interventions and (b) those that are passed to the functions in
Each intervention argument takes as input a list with the following elements:
For example, an "always treat" intervention on |
Details
To assess model misspecification in the parametric g-formula, users can obtain inverse probability (IP) weighted estimates of the natural course risk and/or means of the time-varying covariates from the observed data. See Chiu et al. (2023) for details. In addition to the general requirements described in McGrath et al. (2020), the requirements for the input data set and the call to the gformula function for such analyses are described below.
Users need to include a column in obs_data
with a time-varying censoring variable.
Users need to indicate the name of the censoring variable and a model statement for the censoring variable with parameters censor_name
and censor_model
, respectively.
When competing events are present, users need to include a column in obs_data
with a time-varying indicator of the competing event variable and need to indicate the name of the competing event variable and the corresponding model statement with parameters compevent_name
and compevent_model
, respectively.
Users need to indicate whether to treat competing events as censoring events with the compevent_cens
parameter.
Finally, users can specify how to truncate IP weights with the ipw_cutoff_quantile
or ipw_cutoff_value
parameters.
In addition to the package output described in McGrath et al. (2020), the output will display estimates of the "cumulative percent intervened on" and the "average percent intervened on". When using a custom intervention function, users need to specify whether each individual at that time point is eligible to contribute person-time to the percent intervened on calculations. Specifically, this must be specified in the eligible_pt
column of newdf
. By default, eligible_pt
is set to TRUE
for each individual at each time point in custom interventions.
Value
An object of class "gformula_survival" (for survival outcomes), "gformula_continuous_eof" (for continuous end-of-follow-up outcomes), or "gformula_binary_eof" (for binary end-of-follow-up outcomes). The object is a list with the following components:
result |
Results table. For survival outcomes, this contains the estimated risk, risk difference, and risk ratio for all interventions (inculding the natural course) at each time point. For continuous end-of-follow-up outcomes, this contains estimated mean outcome, mean difference, and mean ratio for all interventions (inculding natural course) at the last time point. For binary end-of-follow-up outcomes, this contains the estimated outcome probability, probability difference, and probability ratio for all interventions (inculding natural course) at the last time point. For all outcome types, this also contains the "cumulative percent intervened on" and the "average percent intervened on". If bootstrapping was used, the results table includes the bootstrap risk / mean / probability difference, ratio, standard error, and 95% confidence interval. |
coeffs |
A list of the coefficients of the fitted models. |
stderrs |
A list of the standard errors of the coefficients of the fitted models. |
vcovs |
A list of the variance-covariance matrices of the parameters of the fitted models. |
rmses |
A list of root mean square error (RMSE) values of the fitted models. |
hazardratio_val |
Hazard ratio between two interventions (if applicable). |
fits |
A list of the fitted models for the time-varying covariates, outcome, and competing event (if applicable). If |
sim_data |
A list of data tables of the simulated data. Each element in the list corresponds to one of the interventions. If the argument |
IP_weights |
A numeric vector specifying the inverse probability weights. See "Details". |
bootests |
A data.table containing the bootstrap replicates of the parametric g-formula estimates. If |
bootcoeffs |
A list, where the kth element is a list containing the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootstderrs |
A list, where the kth element is a list containing the standard errors of the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootvcovs |
A list, where the kth element is a list containing the variance-covariance matrices of the parameters of the fitted models corresponding to the kth bootstrap sample. If |
... |
Some additional elements. |
The results for the g-formula simulation are printed with the print.gformula_survival
, print.gformula_continuous_eof
, and print.gformula_binary_eof
functions. To generate graphs comparing the mean estimated covariate values and risks over time and mean observed covariate values and risks over time, use the plot.gformula_survival
, plot.gformula_continuous_eof
, and plot.gformula_binary_eof
functions.
References
Chiu YH, Wen L, McGrath S, Logan R, Dahabreh IJ, Hernán MA. Evaluating model specification when using the parametric g-formula in the presence of censoring. American Journal of Epidemiology. 2023;192:1887–1895.
McGrath S, Lin V, Zhang Z, Petito LC, Logan RW, Hernán MA, and JG Young. gfoRmula: An R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns. 2020;1:100008.
Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917.-921. Addendum (1987) in Computers and Mathematics with Applications 14, 923-.945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.].
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
## Estimating the effect of treatment strategies on risk of a failure event
## when competing events exist
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
compevent_name <- 'D'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covlink = c('logit', 'identity', 'logit'),
covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + as.factor(t0),
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + as.factor(t0),
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + as.factor(t0)))
ymodel <- Y ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3 + as.factor(t0)
compevent_model <- D ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3 + as.factor(t0)
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type,
compevent_name = compevent_name,
covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
compevent_model = compevent_model,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
## Estimating the effect of treatment strategies on the mean of a continuous
## end of follow-up outcome
library('Hmisc')
id <- 'id'
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'continuous_eof'
covtypes <- c('categorical', 'normal', 'binary')
histories <- c(lagged)
histvars <- list(c('A', 'L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag1_L1 + L3 + t0 +
rcspline.eval(lag1_L2, knots = c(-1, 0, 1)),
L2 ~ lag1_A + L1 + lag1_L1 + lag1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag1_L1 + lag1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3
intervention1.A <- list(static, rep(0, 7))
intervention2.A <- list(static, rep(1, 7))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_cont_eof <- gformula(obs_data = continuous_eofdata,
id = id, time_name = time_name,
covnames = covnames, outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c("L3"), nsimul = nsimul, seed = 1234)
gform_cont_eof
## Estimating the effect of threshold interventions on the mean of a binary
## end of follow-up outcome
outcome_type <- 'binary_eof'
id <- 'id_num'
time_name <- 'time'
covnames <- c('cov1', 'cov2', 'treat')
outcome_name <- 'outcome'
histories <- c(lagged, cumavg)
histvars <- list(c('treat', 'cov1', 'cov2'), c('cov1', 'cov2'))
covtypes <- c('binary', 'zero-inflated normal', 'normal')
covparams <- list(covmodels = c(cov1 ~ lag1_treat + lag1_cov1 + lag1_cov2 +
cov3 + time,
cov2 ~ lag1_treat + cov1 + lag1_cov1 +
lag1_cov2 + cov3 + time,
treat ~ lag1_treat + cumavg_cov1 +
cumavg_cov2 + cov3 + time))
ymodel <- outcome ~ treat + cov1 + cov2 + lag1_cov1 + lag1_cov2 + cov3
intervention1.treat <- list(static, rep(0, 7))
intervention2.treat <- list(threshold, 1, Inf)
int_descript <- c('Never treat', 'Threshold - lower bound 1')
nsimul <- 10000
ncores <- 2
gform_bin_eof <- gformula(obs_data = binary_eofdata,
outcome_type = outcome_type, id = id,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.treat = intervention1.treat,
intervention2.treat = intervention2.treat,
int_descript = int_descript, histories = histories,
histvars = histvars, basecovs = c("cov3"),
seed = 1234, parallel = TRUE, nsamples = 5,
nsimul = nsimul, ncores = ncores)
gform_bin_eof
## Using IP weighting to estimate natural course risk
## Only the natural course intervention is included for simplicity
covnames <- c('L', 'A')
histories <- c(lagged)
histvars <- list(c('A', 'L'))
ymodel <- Y ~ L + A
covtypes <- c('binary', 'normal')
covparams <- list(covmodels = c(L ~ lag1_L + lag1_A,
A ~ lag1_L + L + lag1_A))
censor_name <- 'C'
censor_model <- C ~ L
res_censor <- gformula(obs_data = censor_data, id = 'id',
time_name = 't0', covnames = covnames,
outcome_name = 'Y', outcome_type = 'survival',
censor_name = censor_name, censor_model = censor_model,
covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
histories = histories, histvars = histvars,
seed = 1234)
plot(res_censor)
Estimation of Binary End-of-Follow-Up Outcome Under the Parametric G-Formula
Description
Based on an observed data set, this internal function estimates the outcome probability at end-of-follow-up under multiple user-specified interventions using the parametric g-formula. See McGrath et al. (2020) for further details concerning the application and implementation of the parametric g-formula.
Usage
gformula_binary_eof(
obs_data,
id,
time_name,
covnames,
covtypes,
covparams,
covfits_custom = NA,
covpredict_custom = NA,
histvars = NULL,
histories = NA,
basecovs = NA,
censor_name = NULL,
censor_model = NA,
outcome_name,
ymodel,
ymodel_fit_custom = NULL,
ymodel_predict_custom = NULL,
intvars = NULL,
interventions = NULL,
int_times = NULL,
int_descript = NULL,
ref_int = 0,
visitprocess = NA,
restrictions = NA,
yrestrictions = NA,
baselags = FALSE,
nsimul = NA,
sim_data_b = FALSE,
seed,
nsamples = 0,
parallel = FALSE,
ncores = NA,
ci_method = "percentile",
threads,
model_fits = FALSE,
boot_diag = FALSE,
show_progress = TRUE,
ipw_cutoff_quantile = NULL,
ipw_cutoff_value = NULL,
int_visit_type = NULL,
sim_trunc = TRUE,
...
)
Arguments
obs_data |
Data table containing the observed data. |
id |
Character string specifying the name of the ID variable in |
time_name |
Character string specifying the name of the time variable in |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
covparams |
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as |
covfits_custom |
Vector containing custom fit functions for time-varying covariates that
do not fall within the pre-defined covariate types. It should be in
the same order |
covpredict_custom |
Vector containing custom prediction functions for time-varying
covariates that do not fall within the pre-defined covariate types.
It should be in the same order as |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histories |
Vector of history functions to apply to the variables specified in |
basecovs |
Vector of character strings specifying the names of baseline covariates in |
censor_name |
Character string specifying the name of the censoring variable in |
censor_model |
Model statement for the censoring variable. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data. See "Details". |
outcome_name |
Character string specifying the name of the outcome variable in |
ymodel |
Model statement for the outcome variable. |
ymodel_fit_custom |
Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details. The default is |
ymodel_predict_custom |
Function obtaining predictions from the custom outcome model specified in |
intvars |
(Deprecated. See the |
interventions |
(Deprecated. See the |
int_times |
(Deprecated. See the |
int_descript |
Vector of character strings, each describing an intervention. It must
be in same order as the specified interventions (see the |
ref_int |
Integer denoting the intervention to be used as the
reference for calculating the end-of-follow-up mean ratio and mean difference. 0 denotes the
natural course, while subsequent integers denote user-specified
interventions in the order that they are
named in |
visitprocess |
List of vectors. Each vector contains as its first entry
the covariate name of a visit process; its second entry
the name of a covariate whose modeling depends on the
visit process; and its third entry the maximum number
of consecutive visits that can be missed before an
individual is censored. The default is |
restrictions |
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be |
yrestrictions |
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
nsimul |
Number of subjects for whom to simulate data. By default, this argument is set
equal to the number of subjects in |
sim_data_b |
Logical scalar indicating whether to return the simulated data set. If bootstrap samples are used (i.e., |
seed |
Starting seed for simulations and bootstrapping. |
nsamples |
Integer specifying the number of bootstrap samples to generate. The default is 0. |
parallel |
Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores. |
ncores |
Integer specifying the number of CPU cores to use in parallel
simulation. This argument is required when parallel is set to |
ci_method |
Character string specifying the method for calculating the bootstrap 95% confidence intervals, if applicable. The options are |
threads |
Integer specifying the number of threads to be used in |
model_fits |
Logical scalar indicating whether to return the fitted models. Note that if this argument is set to |
boot_diag |
Logical scalar indicating whether to return the parametric g-formula estimates as well as the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is |
show_progress |
Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when |
ipw_cutoff_quantile |
Percentile by which to truncate inverse probability weights. The default is |
ipw_cutoff_value |
Cutoff value by which to truncate inverse probability weights. The default is |
int_visit_type |
Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in |
sim_trunc |
Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type |
... |
Other arguments, including (a) those that specify the interventions and (b) those that are passed to the functions in
Each intervention argument takes as input a list with the following elements:
For example, an "always treat" intervention on |
Details
To assess model misspecification in the parametric g-formula, users can obtain inverse probability (IP) weighted estimates of the natural course means of the time-varying covariates from the observed data. See Chiu et al. (2023) for details. In addition to the general requirements described in McGrath et al. (2020), the requirements for the input data set and the call to the gformula function for such analyses are described below.
Users need to include a column in obs_data
with a time-varying censoring variable.
Users need to indicate the name of the censoring variable and a model statement for the censoring variable with parameters censor_name
and censor_model
, respectively.
Finally, users can specify how to truncate IP weights with the ipw_cutoff_quantile
or ipw_cutoff_value
parameters.
In addition to the package output described in McGrath et al. (2020), the output will display estimates of the "cumulative percent intervened on" and the "average percent intervened on". When using a custom intervention function, users need to specify whether each individual at that time point is eligible to contribute person-time to the percent intervened on calculations. Specifically, this must be specified in the eligible_pt
column of newdf
. By default, eligible_pt
is set to TRUE
for each individual at each time point in custom interventions.
Value
An object of class "gformula_binary_eof". The object is a list with the following components:
result |
Results table containing the estimated outcome probability for all interventions (inculding natural course) at the last time point as well as the "cumulative percent intervened on" and the "average percent intervened on". If bootstrapping was used, the results table includes the bootstrap end-of-follow-up mean ratio, standard error, and 95% confidence interval. |
coeffs |
A list of the coefficients of the fitted models. |
stderrs |
A list of the standard errors of the coefficients of the fitted models. |
vcovs |
A list of the variance-covariance matrices of the parameters of the fitted models. |
rmses |
A list of root mean square error (RMSE) values of the fitted models. |
fits |
A list of the fitted models for the time-varying covariates and outcome. If |
sim_data |
A list of data tables of the simulated data. Each element in the list corresponds to one of the interventions. If the argument |
IP_weights |
A numeric vector specifying the inverse probability weights. See "Details". |
bootests |
A data.table containing the bootstrap replicates of the parametric g-formula estimates. If |
bootcoeffs |
A list, where the kth element is a list containing the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootstderrs |
A list, where the kth element is a list containing the standard errors of the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootvcovs |
A list, where the kth element is a list containing the variance-covariance matrices of the parameters of the fitted models corresponding to the kth bootstrap sample. If |
... |
Some additional elements. |
The results for the g-formula simulation under various interventions for the last time point are printed with the print.gformula_binary_eof
function. To generate graphs comparing the mean estimated and observed covariate values over time, use the plot.gformula_binary_eof
function.
References
Chiu YH, Wen L, McGrath S, Logan R, Dahabreh IJ, Hernán MA. Evaluating model specification when using the parametric g-formula in the presence of censoring. American Journal of Epidemiology. 2023;192:1887–1895.
McGrath S, Lin V, Zhang Z, Petito LC, Logan RW, Hernán MA, and JG Young. gfoRmula: An R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns. 2020;1:100008.
Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917.-921. Addendum (1987) in Computers and Mathematics with Applications 14, 923-.945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.].
See Also
Examples
## Estimating the effect of threshold interventions on the mean of a binary
## end of follow-up outcome
id <- 'id_num'
time_name <- 'time'
covnames <- c('cov1', 'cov2', 'treat')
outcome_name <- 'outcome'
histories <- c(lagged, cumavg)
histvars <- list(c('treat', 'cov1', 'cov2'), c('cov1', 'cov2'))
covtypes <- c('binary', 'zero-inflated normal', 'normal')
covparams <- list(covmodels = c(cov1 ~ lag1_treat + lag1_cov1 + lag1_cov2 + cov3 +
time,
cov2 ~ lag1_treat + cov1 + lag1_cov1 + lag1_cov2 +
cov3 + time,
treat ~ lag1_treat + cumavg_cov1 +
cumavg_cov2 + cov3 + time))
ymodel <- outcome ~ treat + cov1 + cov2 + lag1_cov1 + lag1_cov2 + cov3
intervention1.treat <- list(static, rep(0, 7))
intervention2.treat <- list(threshold, 1, Inf)
int_descript <- c('Never treat', 'Threshold - lower bound 1')
nsimul <- 10000
ncores <- 2
gform_bin_eof <- gformula_binary_eof(obs_data = binary_eofdata, id = id,
time_name = time_name,
covnames = covnames,
outcome_name = outcome_name,
covtypes = covtypes,
covparams = covparams,
ymodel = ymodel,
intervention1.treat = intervention1.treat,
intervention2.treat = intervention2.treat,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c("cov3"), seed = 1234,
parallel = TRUE, nsamples = 5,
nsimul = nsimul, ncores = ncores)
gform_bin_eof
Estimation of Continuous End-of-Follow-Up Outcome Under the Parametric G-Formula
Description
Based on an observed data set, this internal function estimates the outcome mean at end-of-follow-up under multiple user-specified interventions using the parametric g-formula. See McGrath et al. (2020) for further details concerning the application and implementation of the parametric g-formula.
Usage
gformula_continuous_eof(
obs_data,
id,
time_name,
covnames,
covtypes,
covparams,
covfits_custom = NA,
covpredict_custom = NA,
histvars = NULL,
histories = NA,
basecovs = NA,
outcome_name,
ymodel,
ymodel_fit_custom = NULL,
ymodel_predict_custom = NULL,
censor_name = NULL,
censor_model = NA,
intvars = NULL,
interventions = NULL,
int_times = NULL,
int_descript = NULL,
ref_int = 0,
visitprocess = NA,
restrictions = NA,
yrestrictions = NA,
baselags = FALSE,
nsimul = NA,
sim_data_b = FALSE,
seed,
nsamples = 0,
parallel = FALSE,
ncores = NA,
ci_method = "percentile",
threads,
model_fits = FALSE,
boot_diag = FALSE,
show_progress = TRUE,
ipw_cutoff_quantile = NULL,
ipw_cutoff_value = NULL,
int_visit_type = NULL,
sim_trunc = TRUE,
...
)
Arguments
obs_data |
Data table containing the observed data. |
id |
Character string specifying the name of the ID variable in |
time_name |
Character string specifying the name of the time variable in |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
covparams |
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as |
covfits_custom |
Vector containing custom fit functions for time-varying covariates that
do not fall within the pre-defined covariate types. It should be in
the same order |
covpredict_custom |
Vector containing custom prediction functions for time-varying
covariates that do not fall within the pre-defined covariate types.
It should be in the same order as |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histories |
Vector of history functions to apply to the variables specified in |
basecovs |
Vector of character strings specifying the names of baseline covariates in |
outcome_name |
Character string specifying the name of the outcome variable in |
ymodel |
Model statement for the outcome variable. |
ymodel_fit_custom |
Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details. The default is |
ymodel_predict_custom |
Function obtaining predictions from the custom outcome model specified in |
censor_name |
Character string specifying the name of the censoring variable in |
censor_model |
Model statement for the censoring variable. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data. See "Details". |
intvars |
(Deprecated. See the |
interventions |
(Deprecated. See the |
int_times |
(Deprecated. See the |
int_descript |
Vector of character strings, each describing an intervention. It must
be in same order as the specified interventions (see the |
ref_int |
Integer denoting the intervention to be used as the
reference for calculating the end-of-follow-up mean ratio and mean difference. 0 denotes the
natural course, while subsequent integers denote user-specified
interventions in the order that they are
named in |
visitprocess |
List of vectors. Each vector contains as its first entry
the covariate name of a visit process; its second entry
the name of a covariate whose modeling depends on the
visit process; and its third entry the maximum number
of consecutive visits that can be missed before an
individual is censored. The default is |
restrictions |
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be |
yrestrictions |
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
nsimul |
Number of subjects for whom to simulate data. By default, this argument is set
equal to the number of subjects in |
sim_data_b |
Logical scalar indicating whether to return the simulated data set. If bootstrap samples are used (i.e., |
seed |
Starting seed for simulations and bootstrapping. |
nsamples |
Integer specifying the number of bootstrap samples to generate. The default is 0. |
parallel |
Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores. |
ncores |
Integer specifying the number of CPU cores to use in parallel
simulation. This argument is required when parallel is set to |
ci_method |
Character string specifying the method for calculating the bootstrap 95% confidence intervals, if applicable. The options are |
threads |
Integer specifying the number of threads to be used in |
model_fits |
Logical scalar indicating whether to return the fitted models. Note that if this argument is set to |
boot_diag |
Logical scalar indicating whether to return the parametric g-formula estimates as well as the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is |
show_progress |
Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when |
ipw_cutoff_quantile |
Percentile by which to truncate inverse probability weights. The default is |
ipw_cutoff_value |
Cutoff value by which to truncate inverse probability weights. The default is |
int_visit_type |
Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in |
sim_trunc |
Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type |
... |
Other arguments, including (a) those that specify the interventions and (b) those that are passed to the functions in
Each intervention argument takes as input a list with the following elements:
For example, an "always treat" intervention on |
Details
To assess model misspecification in the parametric g-formula, users can obtain inverse probability (IP) weighted estimates of the natural course means of the time-varying covariates from the observed data. See Chiu et al. (2023) for details. In addition to the general requirements described in McGrath et al. (2020), the requirements for the input data set and the call to the gformula function for such analyses are described below.
Users need to include a column in obs_data
with a time-varying censoring variable.
Users need to indicate the name of the censoring variable and a model statement for the censoring variable with parameters censor_name
and censor_model
, respectively.
Finally, users can specify how to truncate IP weights with the ipw_cutoff_quantile
or ipw_cutoff_value
parameters.
In addition to the package output described in McGrath et al. (2020), the output will display estimates of the "cumulative percent intervened on" and the "average percent intervened on". When using a custom intervention function, users need to specify whether each individual at that time point is eligible to contribute person-time to the percent intervened on calculations. Specifically, this must be specified in the eligible_pt
column of newdf
. By default, eligible_pt
is set to TRUE
for each individual at each time point in custom interventions.
Value
An object of class "gformula_continuous_eof". The object is a list with the following components:
result |
Results table containing the estimated mean outcome for all interventions (inculding natural course) at the last time point as well as the "cumulative percent intervened on" and the "average percent intervened on". If bootstrapping was used, the results table includes the bootstrap end-of-follow-up mean ratio, standard error, and 95% confidence interval. |
coeffs |
A list of the coefficients of the fitted models. |
stderrs |
A list of the standard errors of the coefficients of the fitted models. |
vcovs |
A list of the variance-covariance matrices of the parameters of the fitted models. |
rmses |
A list of root mean square error (RMSE) values of the fitted models. |
fits |
A list of the fitted models for the time-varying covariates and outcome. If |
sim_data |
A list of data tables of the simulated data. Each element in the list corresponds to one of the interventions. If the argument |
IP_weights |
A numeric vector specifying the inverse probability weights. See "Details". |
bootests |
A data.table containing the bootstrap replicates of the parametric g-formula estimates. If |
bootcoeffs |
A list, where the kth element is a list containing the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootstderrs |
A list, where the kth element is a list containing the standard errors of the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootvcovs |
A list, where the kth element is a list containing the variance-covariance matrices of the parameters of the fitted models corresponding to the kth bootstrap sample. If |
... |
Some additional elements. |
The results for the g-formula simulation under various interventions for the last time point are printed with the print.gformula_continuous_eof
function. To generate graphs comparing the mean estimated and observed covariate values over time, use the print.gformula_continuous_eof
function.
References
Chiu YH, Wen L, McGrath S, Logan R, Dahabreh IJ, Hernán MA. Evaluating model specification when using the parametric g-formula in the presence of censoring. American Journal of Epidemiology. 2023;192:1887–1895.
McGrath S, Lin V, Zhang Z, Petito LC, Logan RW, Hernán MA, and JG Young. gfoRmula: An R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns. 2020;1:100008.
Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917.-921. Addendum (1987) in Computers and Mathematics with Applications 14, 923-.945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.].
See Also
Examples
## Estimating the effect of treatment strategies on the mean of a continuous
## end of follow-up outcome
library('Hmisc')
id <- 'id'
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
covtypes <- c('categorical', 'normal', 'binary')
histories <- c(lagged)
histvars <- list(c('A', 'L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag1_L1 + L3 + t0 +
rcspline.eval(lag1_L2, knots = c(-1, 0, 1)),
L2 ~ lag1_A + L1 + lag1_L1 + lag1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag1_L1 + lag1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3
intervention1.A <- list(static, rep(0, 7))
intervention2.A <- list(static, rep(1, 7))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_cont_eof <- gformula_continuous_eof(obs_data = continuous_eofdata,
id = id,
time_name = time_name,
covnames = covnames,
outcome_name = outcome_name,
covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c("L3"),
nsimul = nsimul, seed = 1234)
gform_cont_eof
Estimation of Survival Outcome Under the Parametric G-Formula
Description
Based on an observed data set, this internal function estimates the risk over time under multiple user-specified interventions using the parametric g-formula. See McGrath et al. (2020) for further details concerning the application and implementation of the parametric g-formula.
Usage
gformula_survival(
obs_data,
id,
time_points = NULL,
time_name,
covnames,
covtypes,
covparams,
covfits_custom = NA,
covpredict_custom = NA,
histvars = NULL,
histories = NA,
basecovs = NA,
outcome_name,
ymodel,
ymodel_fit_custom = NULL,
ymodel_predict_custom = NULL,
compevent_name = NULL,
compevent_model = NA,
compevent_cens = FALSE,
censor_name = NULL,
censor_model = NA,
intvars = NULL,
interventions = NULL,
int_times = NULL,
int_descript = NULL,
ref_int = 0,
intcomp = NA,
visitprocess = NA,
restrictions = NA,
yrestrictions = NA,
compevent_restrictions = NA,
baselags = FALSE,
nsimul = NA,
sim_data_b = FALSE,
seed,
nsamples = 0,
parallel = FALSE,
ncores = NA,
ci_method = "percentile",
threads,
model_fits = FALSE,
boot_diag = FALSE,
show_progress = TRUE,
ipw_cutoff_quantile = NULL,
ipw_cutoff_value = NULL,
int_visit_type = NULL,
sim_trunc = TRUE,
...
)
Arguments
obs_data |
Data table containing the observed data. |
id |
Character string specifying the name of the ID variable in |
time_points |
Number of time points to simulate. By default, this argument is set equal to the maximum
number of records that |
time_name |
Character string specifying the name of the time variable in |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
covparams |
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as |
covfits_custom |
Vector containing custom fit functions for time-varying covariates that
do not fall within the pre-defined covariate types. It should be in
the same order |
covpredict_custom |
Vector containing custom prediction functions for time-varying
covariates that do not fall within the pre-defined covariate types.
It should be in the same order as |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histories |
Vector of history functions to apply to the variables specified in |
basecovs |
Vector of character strings specifying the names of baseline covariates in |
outcome_name |
Character string specifying the name of the outcome variable in |
ymodel |
Model statement for the outcome variable. |
ymodel_fit_custom |
Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details. The default is |
ymodel_predict_custom |
Function obtaining predictions from the custom outcome model specified in |
compevent_name |
Character string specifying the name of the competing event variable in |
compevent_model |
Model statement for the competing event variable. The default is |
compevent_cens |
Logical scalar indicating whether to treat competing events as censoring events.
This argument is only applicable for survival outcomes and when a competing even model is supplied (i.e., |
censor_name |
Character string specifying the name of the censoring variable in |
censor_model |
Model statement for the censoring variable. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data. See "Details". |
intvars |
(Deprecated. See the |
interventions |
(Deprecated. See the |
int_times |
(Deprecated. See the |
int_descript |
Vector of character strings, each describing an intervention. It must
be in same order as the specified interventions (see the |
ref_int |
Integer denoting the intervention to be used as the
reference for calculating the risk ratio and risk difference. 0 denotes the
natural course, while subsequent integers denote user-specified
interventions in the order that they are
named in |
intcomp |
List of two numbers indicating a pair of interventions to be compared by a hazard ratio.
The default is |
visitprocess |
List of vectors. Each vector contains as its first entry
the covariate name of a visit process; its second entry
the name of a covariate whose modeling depends on the
visit process; and its third entry the maximum number
of consecutive visits that can be missed before an
individual is censored. The default is |
restrictions |
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be |
yrestrictions |
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is |
compevent_restrictions |
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
nsimul |
Number of subjects for whom to simulate data. By default, this argument is set
equal to the number of subjects in |
sim_data_b |
Logical scalar indicating whether to return the simulated data set. If bootstrap samples are used (i.e., |
seed |
Starting seed for simulations and bootstrapping. |
nsamples |
Integer specifying the number of bootstrap samples to generate. The default is 0. |
parallel |
Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores. |
ncores |
Integer specifying the number of CPU cores to use in parallel
simulation. This argument is required when parallel is set to |
ci_method |
Character string specifying the method for calculating the bootstrap 95% confidence intervals, if applicable. The options are |
threads |
Integer specifying the number of threads to be used in |
model_fits |
Logical scalar indicating whether to return the fitted models. Note that if this argument is set to |
boot_diag |
Logical scalar indicating whether to return the parametric g-formula estimates as well as the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is |
show_progress |
Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when |
ipw_cutoff_quantile |
Percentile by which to truncate inverse probability weights. The default is |
ipw_cutoff_value |
Cutoff value by which to truncate inverse probability weights. The default is |
int_visit_type |
Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in |
sim_trunc |
Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type |
... |
Other arguments, including (a) those that specify the interventions and (b) those that are passed to the functions in
Each intervention argument takes as input a list with the following elements:
For example, an "always treat" intervention on |
Details
To assess model misspecification in the parametric g-formula, users can obtain inverse probability (IP) weighted estimates of the natural course risk and/or means of the time-varying covariates from the observed data. See Chiu et al. (2023) for details. In addition to the general requirements described in McGrath et al. (2020), the requirements for the input data set and the call to the gformula function for such analyses are described below.
Users need to include a column in obs_data
with a time-varying censoring variable.
Users need to indicate the name of the censoring variable and a model statement for the censoring variable with parameters censor_name
and censor_model
, respectively.
When competing events are present, users need to include a column in obs_data
with a time-varying indicator of the competing event variable and need to indicate the name of the competing event variable and the corresponding model statement with parameters compevent_name
and compevent_model
, respectively.
Users need to indicate whether to treat competing events as censoring events with the compevent_cens
parameter.
Finally, users can specify how to truncate IP weights with the ipw_cutoff_quantile
or ipw_cutoff_value
parameters.
In addition to the package output described in McGrath et al. (2020), the output will display estimates of the "cumulative percent intervened on" and the "average percent intervened on". When using a custom intervention function, users need to specify whether each individual at that time point is eligible to contribute person-time to the percent intervened on calculations. Specifically, this must be specified in the eligible_pt
column of newdf
. By default, eligible_pt
is set to TRUE
for each individual at each time point in custom interventions.
Value
An object of class "gformula_survival". The object is a list with the following components:
result |
Results table containing the estimated risk and risk ratio for all interventions (inculding the natural course) at each time point as well as the "cumulative percent intervened on" and the "average percent intervened on". If bootstrapping was used, the results table includes the bootstrap mean risk ratio, standard error, and 95% confidence interval. |
coeffs |
A list of the coefficients of the fitted models. |
stderrs |
A list of the standard errors of the coefficients of the fitted models. |
vcovs |
A list of the variance-covariance matrices of the parameters of the fitted models. |
rmses |
A list of root mean square error (RMSE) values of the fitted models. |
hazardratio_val |
Hazard ratio between two interventions (if applicable). |
fits |
A list of the fitted models for the time-varying covariates, outcome, and competing event (if applicable). If |
sim_data |
A list of data tables of the simulated data. Each element in the list corresponds to one of the interventions. If the argument |
IP_weights |
A numeric vector specifying the inverse probability weights. See "Details". |
bootests |
A data.table containing the bootstrap replicates of the parametric g-formula estimates. If |
bootcoeffs |
A list, where the kth element is a list containing the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootstderrs |
A list, where the kth element is a list containing the standard errors of the coefficients of the fitted models corresponding to the kth bootstrap sample. If |
bootvcovs |
A list, where the kth element is a list containing the variance-covariance matrices of the parameters of the fitted models corresponding to the kth bootstrap sample. If |
... |
Some additional elements. |
The results for the g-formula simulation under various interventions only for the first and last time points are printed with the print.gformula_survival
function. To generate graphs comparing the mean estimated covariate values and risks over time and mean observed covariate values and risks over time, use the plot.gformula_survival
function.
References
Chiu YH, Wen L, McGrath S, Logan R, Dahabreh IJ, Hernán MA. Evaluating model specification when using the parametric g-formula in the presence of censoring. American Journal of Epidemiology. 2023;192:1887–1895.
McGrath S, Lin V, Zhang Z, Petito LC, Logan RW, Hernán MA, and JG Young. gfoRmula: An R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns. 2020;1:100008.
Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917.-921. Addendum (1987) in Computers and Mathematics with Applications 14, 923-.945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.].
See Also
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula_survival(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
## Estimating the effect of treatment strategies on risk of a failure event
## when competing events exist
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
compevent_name <- 'D'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covlink = c('logit', 'identity', 'logit'),
covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + as.factor(t0),
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + as.factor(t0),
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + as.factor(t0)))
ymodel <- Y ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3 + as.factor(t0)
compevent_model <- D ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3 + as.factor(t0)
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula_survival(obs_data = basicdata, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
compevent_name = compevent_name,
covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
compevent_model = compevent_model,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
## Using IP weighting to estimate natural course risk
## Only the natural course intervention is included for simplicity
covnames <- c('L', 'A')
histories <- c(lagged)
histvars <- list(c('A', 'L'))
ymodel <- Y ~ L + A
covtypes <- c('binary', 'normal')
covparams <- list(covmodels = c(L ~ lag1_L + lag1_A,
A ~ lag1_L + L + lag1_A))
censor_name <- 'C'
censor_model <- C ~ L
res_censor <- gformula(obs_data = censor_data, id = 'id',
time_name = 't0', covnames = covnames,
outcome_name = 'Y', outcome_type = 'survival',
censor_name = censor_name, censor_model = censor_model,
covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intvars = NULL, interventions = NULL, int_descript = NULL,
histories = histories, histvars = histvars,
seed = 1234)
plot(res_censor)
Format Simulated Dataset for Hazard Ratio Calculation
Description
This internal function modifies the pooled-over-time dataset generated by the simulate
function
to a format suitable for hazard ratio calculation.
Usage
hr_helper(i, intcomp, time_name, pools)
Arguments
i |
Integer specifying the index |
intcomp |
List of two numbers indicating a pair of interventions to be compared by a hazard ratio. |
time_name |
Character string specifying the name of the time variable in |
pools |
Data table containing the simulated values for the covariates, outcome probabilities, competing event probabilities, outcomes, and competing events. |
Value
Data table consisting of failure time information for each individual or the last time point information (for individuals who do not experience an event).
Execute Intervention
Description
This internal function executes the intervention of interest on the specified intervention variable
in the data table newdf
.
Usage
intfunc(newdf, pool, intervention, intvar, int_time, time_name, t)
Arguments
newdf |
Data table containing the simulated data at time |
pool |
Data table containing the simulated data at times before |
intervention |
List, whose elements are lists of vectors. Each vector contains a function implementing a particular intervention on a single variable, optionally followed by one or more "intervention values" (i.e., integers used to specify the treatment regime). |
intvar |
Character string specifying the name of the variable to be intervened on in each round of the simulation. |
int_time |
Vector specifying the time points in which the intervention is applied. |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
Value
No value is returned. The data table newdf
is modified in place.
History functions
Description
These functions create new columns in an input data table for covariate histories. Users must specify which covariates are to be used in the history functions.
Usage
lagged(
pool,
histvars,
histvals,
time_name,
t,
id_name,
baselags,
below_zero_indicator
)
cumavg(pool, histvars, time_name, t, id_name, below_zero_indicator)
lagavg(
pool,
histvars,
histvals,
time_name,
t,
id_name,
baselags,
below_zero_indicator
)
Arguments
pool |
Data table containing all information prior to time |
histvars |
Vector of character strings specifying the names of the variables for which history functions are to be applied. |
histvals |
For |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
id_name |
Character string specifying the name of the ID variable in |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
below_zero_indicator |
Logical scalar indicating whether the observed data set contains rows for time |
Details
lagged
creates new columns for lagged versions of existing
variables in the dataset. The user must specify which variables are to be lagged.
cumavg
creates new columns for the cumulative average up until
time t
of existing variables in the dataset.
lagavg
creates new columns for the "lagged cumulative average"
(cumulative average up until time t, then lagged by one time unit) up until time t
of existing
variables in the dataset.
Value
No value is returned. The data table pool
is modified in place.
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
Generates Functions of History of Existing Covariates
Description
This internal function applies the history functions to create new columns in an input data table containing new variables that are functions of the histories of existing variables in the dataset.
Usage
make_histories(
pool,
histvars,
histvals,
histories,
time_name,
t,
id,
max_visits,
baselags,
below_zero_indicator
)
Arguments
pool |
Data table containing all information prior to time |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histvals |
List of length two. The first element is a numeric vector specifying the lags used in the model statements (e.g., if |
histories |
Vector of history functions to apply to the variables specified in |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
id |
Character string specifying the name of the ID variable in |
max_visits |
A vector of one or more values denoting the maximum number of times
a binary covariate representing a visit process may be missed before
the individual is censored from the data (in the observed data) or
a visit is forced (in the simulated data). Multiple values exist in the
vector when the modeling of more than covariate is attached to a visit
process. A value of |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
below_zero_indicator |
Logical scalar indicating whether the observed data set contains rows for time |
Value
No value is returned. The data table pool
is modified in place.
Natural Course Intervention
Description
This function implements the natural course (i.e., model-based simulation) for the specified
intervention variable in the data table newdf. Because newdf
is initiated with the
natural course, this function does nothing.
Usage
natural(newdf, pool, intvar, intvals, time_name, t)
Arguments
newdf |
Data table containing the simulated data at time |
pool |
Data table containing the simulated data at times before |
intvar |
Character string specifying the name of the variable to be intervened on in each round of the simulation. |
intvals |
A list of user-specified values for the intervention. |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
Value
No value is returned.
Calculate Observed Covariate Means and Risk
Description
This internal function calculates the mean observed values of covariates at each time point, as well as mean observed risk.
Usage
obs_calculate(
outcome_name,
compevent_name,
compevent2_name,
censor_name,
time_name,
id,
covnames,
covtypes,
comprisk,
comprisk2,
censor,
fitD2,
fitC,
outcome_type,
obs_data,
ipw_cutoff_quantile,
ipw_cutoff_value
)
Arguments
outcome_name |
Character string specifying the name of the outcome variable in |
compevent_name |
Character string specifying the name of the competing event variable in |
compevent2_name |
Character string specifying the name of the competing event variable in |
censor_name |
Character string specifying the name of the censoring variable in |
time_name |
Character string specifying the name of the time variable in |
id |
Character string specifying the name of the ID variable in |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
comprisk |
Logical scalar indicating the presence of a competing event. |
comprisk2 |
Logical scalar indicating whether competing events are treated as censoring events. |
censor |
Logical scalar indicating the presence of a censoring variable in |
fitD2 |
Model fit for the competing event variable if competing events are treated as censoring events. |
fitC |
Model fit for the censoring variable. |
outcome_type |
Character string specifying the "type" of the outcome. The possible "types" are: |
obs_data |
Data table containing the observed data. |
ipw_cutoff_quantile |
Percentile by which to truncate inverse probability weights. |
ipw_cutoff_value |
Cutoff value by which to truncate inverse probability weights. |
Value
A list. Its first entry is a list of mean covariate values at each time point;
its second entry is a vector of the mean observed risk (for "survival"
outcome types) or the mean observed outcome (for "continuous_eof"
and
"binary_eof"
outcome types); for "survival"
outcome types, its
third entry is a vector of mean observed survival.
Plot method for objects of class "gformula_binary_eof"
Description
This function generates graphs of the mean simulated vs. observed values at each time point of the time-varying covariates under the natural course. For categorical covariates, the observed and simulated probability of each level are plotted at each time point.
Usage
## S3 method for class 'gformula_binary_eof'
plot(
x,
covnames = NULL,
ncol = NULL,
nrow = NULL,
common.legend = TRUE,
legend = "bottom",
xlab = NULL,
ylab_cov = NULL,
...
)
Arguments
x |
Object of class "gformula_binary_eof". |
covnames |
Vector of character strings specifying the names of the time-varying covariates to be plotted. The ordering of covariates given here is used in the plot grid. Time-varying covariates of type |
ncol |
Number of columns in the plot grid. By default, two columns are used when there is at least two plots. |
nrow |
Number of rows in the plot grid. By default, a maximum of six rows is used and additional plots are included in subsequent pages. |
common.legend |
Logical scalar indicating whether to include a legend. The default is |
legend |
Character string specifying the legend position. Valid values are |
xlab |
Character string for the x axes of all plots. By default, this argument is set to the |
ylab_cov |
Vector of character strings for the y axes of the plots for the covariates. This argument must be the same length as |
... |
Other arguments, which are passed to |
Value
An object of class "ggarrange". See documentation of ggarrange
.
See Also
Examples
## Estimating the effect of threshold interventions on the mean of a binary
## end of follow-up outcome
outcome_type <- 'binary_eof'
id <- 'id_num'
time_name <- 'time'
covnames <- c('cov1', 'cov2', 'treat')
outcome_name <- 'outcome'
histories <- c(lagged, cumavg)
histvars <- list(c('treat', 'cov1', 'cov2'), c('cov1', 'cov2'))
covtypes <- c('binary', 'zero-inflated normal', 'normal')
covparams <- list(covmodels = c(cov1 ~ lag1_treat + lag1_cov1 + lag1_cov2 +
cov3 + time,
cov2 ~ lag1_treat + cov1 + lag1_cov1 +
lag1_cov2 + cov3 + time,
treat ~ lag1_treat + cumavg_cov1 +
cumavg_cov2 + cov3 + time))
ymodel <- outcome ~ treat + cov1 + cov2 + lag1_cov1 + lag1_cov2 + cov3
intervention1.treat <- list(static, rep(0, 7))
intervention2.treat <- list(threshold, 1, Inf)
int_descript <- c('Never treat', 'Threshold - lower bound 1')
nsimul <- 10000
ncores <- 2
gform_bin_eof <- gformula(obs_data = binary_eofdata,
outcome_type = outcome_type, id = id,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.treat = intervention1.treat,
intervention2.treat = intervention2.treat,
int_descript = int_descript, histories = histories,
histvars = histvars, basecovs = c("cov3"),
seed = 1234, parallel = TRUE, nsamples = 5,
nsimul = nsimul, ncores = ncores)
plot(gform_bin_eof)
Plot method for objects of class "gformula_continuous_eof"
Description
This function generates graphs of the mean simulated vs. observed values at each time point of the time-varying covariates under the natural course. For categorical covariates, the observed and simulated probability of each level are plotted at each time point.
Usage
## S3 method for class 'gformula_continuous_eof'
plot(
x,
covnames = NULL,
ncol = NULL,
nrow = NULL,
common.legend = TRUE,
legend = "bottom",
xlab = NULL,
ylab_cov = NULL,
...
)
Arguments
x |
Object of class "gformula_continuous_eof". |
covnames |
Vector of character strings specifying the names of the time-varying covariates to be plotted. The ordering of covariates given here is used in the plot grid. Time-varying covariates of type |
ncol |
Number of columns in the plot grid. By default, two columns are used when there is at least two plots. |
nrow |
Number of rows in the plot grid. By default, a maximum of six rows is used and additional plots are included in subsequent pages. |
common.legend |
Logical scalar indicating whether to include a legend. The default is |
legend |
Character string specifying the legend position. Valid values are |
xlab |
Character string for the x axes of all plots. By default, this argument is set to the |
ylab_cov |
Vector of character strings for the y axes of the plots for the covariates. This argument must be the same length as |
... |
Other arguments, which are passed to |
Value
An object of class "ggarrange". See documentation of ggarrange
.
See Also
Examples
## Estimating the effect of treatment strategies on the mean of a continuous
## end of follow-up outcome
library('Hmisc')
id <- 'id'
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'continuous_eof'
covtypes <- c('categorical', 'normal', 'binary')
histories <- c(lagged)
histvars <- list(c('A', 'L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag1_L1 + L3 + t0 +
rcspline.eval(lag1_L2, knots = c(-1, 0, 1)),
L2 ~ lag1_A + L1 + lag1_L1 + lag1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag1_L1 + lag1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + lag1_A + lag1_L1 + lag1_L2 + L3
intervention1.A <- list(static, rep(0, 7))
intervention2.A <- list(static, rep(1, 7))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_cont_eof <- gformula(obs_data = continuous_eofdata,
id = id, time_name = time_name,
covnames = covnames, outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c("L3"), nsimul = nsimul, seed = 1234)
plot(gform_cont_eof)
Plot method for objects of class "gformula_survival"
Description
This function generates graphs of the mean simulated vs. observed values at each time point of the time-varying covariates, risk, and survival under the natural course. For categorical covariates, the observed and simulated probability of each level are plotted at each time point.
Usage
## S3 method for class 'gformula_survival'
plot(
x,
covnames = NULL,
risk = TRUE,
survival = FALSE,
ncol = NULL,
nrow = NULL,
common.legend = TRUE,
legend = "bottom",
xlab = NULL,
ylab_cov = NULL,
ylab_risk = "risk",
ylab_surv = "survival",
pos_risk = NULL,
pos_surv = NULL,
ci_risk = FALSE,
...
)
Arguments
x |
Object of class "gformula_survival". |
covnames |
Vector of character strings specifying the names of the time-varying covariates to be plotted. The ordering of covariates given here is used in the plot grid. Time-varying covariates of type |
risk |
Logical scalar indicating whether to include a plot for the risk. The default is |
survival |
Logical scalar indicating whether to include a plot for the survival. The default is |
ncol |
Number of columns in the plot grid. By default, two columns are used when there is at least two plots. |
nrow |
Number of rows in the plot grid. By default, a maximum of six rows is used and additional plots are included in subsequent pages. |
common.legend |
Logical scalar indicating whether to include a legend. The default is |
legend |
Character string specifying the legend position. Valid values are |
xlab |
Character string for the x axes of all plots. By default, this argument is set to the |
ylab_cov |
Vector of character strings for the y axes of the plots for the covariates. This argument must be the same length as |
ylab_risk |
Character string for the y axis of the plot for the risk (if applicable). The default is |
ylab_surv |
Character string for the y axis of the plot for the survival (if applicable). The default is |
pos_risk |
Integer specifying the position at which to order the risk plot (if applicable). By default, this argument is set to the number of plots in the grid minus one (i.e., orders the risk plot second last). |
pos_surv |
Integer specifying the position at which to order the survival plot (if applicable). By default, this argument is set to the number of plots in the grid (i.e., orders the survival plot last). |
ci_risk |
Logical scalar specifying whether to include error bars for the 95% confidence intervals of the estimated risk under the natural course. This argument is only effective if the argument |
... |
Other arguments, which are passed to |
Value
An object of class "ggarrange". See documentation of ggarrange
.
See Also
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
plot(gform_basic)
Fit Competing Event Model
Description
This internal function fits a generalized linear model (GLM) for the competing event variable using the observed data.
Usage
pred_fun_D(model, compevent_restrictions, obs_data, model_fits)
Arguments
model |
Model statement for competing event variable. |
compevent_restrictions |
List of vectors. Each vector containins as its first entry
a condition and its second entry an integer. When the
condition is |
obs_data |
Data on which the model is fit. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
Value
Fitted model for the competing event variable.
Fit Outcome Model
Description
This internal function fits a generalized linear model (GLM) for the outcome variable using the observed data.
Usage
pred_fun_Y(
model,
yrestrictions,
outcome_type,
outcome_name,
time_name,
obs_data,
model_fits,
ymodel_fit_custom
)
Arguments
model |
Model statement for the outcome variable. |
yrestrictions |
List of vectors. Each vector containins as its first entry
a condition and its second entry an integer. When the
condition is |
outcome_type |
Character string specifying the "type" of the outcome. The possible "types" are: |
outcome_name |
Character string specifying the name of the outcome variable in |
time_name |
Character string specifying the name of the time variable in |
obs_data |
Data on which the model is fit. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
ymodel_fit_custom |
Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details. |
Value
Fitted model for the outcome variable.
Fit Covariate Models
Description
This internal function fits a model for each covariate using the observed data.
Usage
pred_fun_cov(
covparams,
covnames,
covtypes,
covfits_custom,
restrictions,
time_name,
obs_data,
model_fits
)
Arguments
covparams |
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
covfits_custom |
Vector containing custom fit functions for time-varying covariates that
do not fall within the pre-defined covariate types. It should be in
the same order |
restrictions |
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be |
time_name |
Character string specifying the name of the time variable in |
obs_data |
Data on which the models are fit. |
model_fits |
Logical scalar indicating whether to return the fitted models. The default is |
Value
A list of fitted models, one for each covariate in covnames
.
Simulate Binary Values
Description
This internal function simulates covariate values from a binomial distribution.
Usage
predict_binomial(x, size, prob)
Arguments
x |
Integer specifying the number of observations to be simulated. |
size |
Integer specifying the number of trials. |
prob |
Numeric vector specifying the probabilities. |
Value
Numeric vector of simulated covariate values under the binomial distribution.
Simulate Normal Values
Description
This internal function simulates covariate values from a normal distribution.
Usage
predict_normal(x, mean, est_sd = NA)
Arguments
x |
Integer specifying the number of observations to be simulated. |
mean |
Numeric scalar specifying the mean of the distribution. |
est_sd |
Numeric scalar specifying the standard deviation of the distribution. |
Value
Numeric vector of simulated covariate values under the normal distribution.
Simulate Truncated Normal Values
Description
This internal function simulates covariate values from a normal distribution truncated on one side.
Usage
predict_trunc_normal(x, mean, est_sd, a, b)
Arguments
x |
Integer specifying the number of observations to be simulated. |
mean |
Numeric scalar specifying the mean of the distribution. |
est_sd |
Numeric scalar specifying the standard deviation of the distribution. |
a |
Numeric scalar specifying the lower bound of truncation. |
b |
Numeric scalar specifying the upper bound of truncation. |
Value
Numeric vector of simulated covariate values under the truncated normal distribution.
Print and summary methods for "gformula" objects
Description
Print and summary method for objects of class "gformula_survival", "gformula_continuous_eof", or "gformula_binary_eof".
Usage
## S3 method for class 'gformula_survival'
print(
x,
all_times = FALSE,
coefficients = FALSE,
stderrs = FALSE,
rmses = FALSE,
hazardratio = FALSE,
fits = FALSE,
...
)
## S3 method for class 'gformula_continuous_eof'
print(
x,
coefficients = FALSE,
stderrs = FALSE,
rmses = FALSE,
fits = FALSE,
...
)
## S3 method for class 'gformula_binary_eof'
print(
x,
coefficients = FALSE,
stderrs = FALSE,
rmses = FALSE,
fits = FALSE,
...
)
## S3 method for class 'gformula'
summary(object, ...)
## S3 method for class 'summary.gformula'
print(
x,
all_times = TRUE,
coefficients = FALSE,
stderrs = FALSE,
rmses = FALSE,
hazardratio = FALSE,
fits = TRUE,
...
)
Arguments
x |
Object of class "gformula_survival", "gformula_continuous_eof", "gformula_binary_eof", or "summary.gformula" (for |
all_times |
Logical scalar indicating whether to print the results for all time points. This argument is only applicable to objects of class "gformula_survival". If this argument is set to |
coefficients |
Logical scalar indicating whether to print the model coefficients. The default is |
stderrs |
Logical scalar indicating whether to print the standard error of the model coefficients. The default is |
rmses |
Logical scalar indicating whether to print the model root mean square errors (RMSEs). The default is |
hazardratio |
Logical scalar indicating whether to print the hazard ratio between two interventions (if computed). If bootstrapping was used, 95% confidence intervals will be given. This argument is only applicable to objects of class "gformula_survival". The default is |
fits |
Logical scalar indicating whether to print summaries of the fitted models for the time-varying covariates, outcome, and competing event (if applicable). This argument is only effective if the argument |
... |
Other arguments. |
object |
Object of class "gformula" (for |
Value
No value is returned for the print
functions. The summary
function returns the object passed to it and adds the class "summary.gformula" to it.
See Also
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
summary(gform_basic)
Calculate RMSE for Covariate, Outcome, and Competing Risk Models
Description
This internal function calculates root mean square error (RMSE) for the covariate (and the outcome and competing event, if applicable) models fit on the observed data.
Usage
rmse_calculate(i, fits, covnames, covtypes)
Arguments
i |
Integer specifying the index of |
fits |
List of fitted models. |
covnames |
Vector of character strings specifying the names of the time-varying covariates in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
Value
The RMSE for the model.
Simple Restriction
Description
This function assists the implementation of a restriction on a covariate in the data
table newdf
by setting lines where the covariate is restricted to a user-specified value.
Usage
simple_restriction(newdf, pool, restriction, time_name, t, ...)
Arguments
newdf |
Data table containing the simulated data at time |
pool |
Data table containing the simulated data at times before |
restriction |
List of vectors. Each vector contains as its first entry
the covariate affected by the restriction; its second entry
the condition that must be |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
... |
This argument is not used in this function. |
Value
No value is returned. The data table newdf
is modified in place.
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event with a simple restriction
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
# At t0 == 5, assume we have deterministic knowledge that L1 equals 0
restrictions <- list(c('L1', 't0 != 5', simple_restriction, 0))
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
restrictions = restrictions,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
Simulate Counterfactual Outcomes Under Intervention
Description
This internal function simulates a new dataset containing covariates, outcome probabilities, competing event probabilities (if any), outcomes, and competing events (if any) based on an observed dataset and a user-specified intervention.
Usage
simulate(
o,
fitcov,
fitY,
fitD,
ymodel_predict_custom,
yrestrictions,
compevent_restrictions,
restrictions,
outcome_name,
compevent_name,
time_name,
intvars,
interventions,
int_times,
histvars,
histvals,
histories,
comprisk,
ranges,
outcome_type,
subseed,
obs_data,
time_points,
parallel,
covnames,
covtypes,
covparams,
covpredict_custom,
basecovs,
max_visits,
baselags,
below_zero_indicator,
min_time,
show_progress,
pb,
int_visit_type,
sim_trunc,
...
)
Arguments
o |
Integer specifying the index of the current intervention. |
fitcov |
List of model fits for the time-varying covariates. |
fitY |
Model fit for the outcome variable. |
fitD |
Model fit for the competing event variable, if any. |
ymodel_predict_custom |
Function obtaining predictions from the custom outcome model specified in |
yrestrictions |
List of vectors. Each vector containins as its first entry
a condition and its second entry an integer. When the
condition is |
compevent_restrictions |
List of vectors. Each vector containins as its first entry
a condition and its second entry an integer. When the
condition is |
restrictions |
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be |
outcome_name |
Character string specifying the name of the outcome variable in |
compevent_name |
Character string specifying the name of the competing event variable in |
time_name |
Character string specifying the name of the time variable in |
intvars |
List, whose elements are vectors of character strings. The kth vector in |
interventions |
List, whose elements are lists of vectors. Each list in |
int_times |
List, whose elements are lists of vectors. The kth list in |
histvars |
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in |
histvals |
List of length two. The first element is a numeric vector specifying the lags used in the model statements (e.g., if |
histories |
Vector of history functions to apply to the variables specified in |
comprisk |
Logical scalar indicating the presence of a competing event. |
ranges |
List of vectors. Each vector contains the minimum and
maximum values of one of the covariates in |
outcome_type |
Character string specifying the "type" of the outcome. The possible "types" are: |
subseed |
Integer specifying the seed for this simulation. |
obs_data |
Data table containing the observed data. |
time_points |
Number of time points to simulate. |
parallel |
Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores. |
covnames |
Character string specifying the name of the competing event variable in |
covtypes |
Vector of character strings specifying the "type" of each time-varying covariate included in |
covparams |
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as |
covpredict_custom |
Vector containing custom prediction functions for time-varying
covariates that do not fall within the pre-defined covariate types.
It should be in the same order as |
basecovs |
Vector of character strings specifying the names of baseline covariates in |
max_visits |
A vector of one or more values denoting the maximum number of times a binary covariate representing a visit process may be missed before the individual is censored from the data (in the observed data) or a visit is forced (in the simulated data). Multiple values exist in the vector when the modeling of more than covariate is attached to a visit process. |
baselags |
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in |
below_zero_indicator |
Logical scalar indicating whether the observed data set contains rows for time |
min_time |
Numeric scalar specifying lowest value of time |
show_progress |
Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when |
pb |
Progress bar R6 object. See |
int_visit_type |
Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in |
sim_trunc |
Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type |
... |
Other arguments, which are passed to the functions in |
Value
A data table containing simulated data under the specified intervention.
Static Intervention
Description
This function implements a static intervention (i.e., either constant treatment or no treatment over
all time points) for the specified intervention variable in the data table newdf
.
Usage
static(newdf, pool, intvar, intvals, time_name, t)
Arguments
newdf |
Data table containing the simulated data at time |
pool |
Data table containing the simulated data at times before |
intvar |
Character string specifying the name of the variable to be intervened on in each round of the simulation. |
intvals |
A list of length 1. The entry is the value of static treatment to be assigned to |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
Value
No value is returned. The data table newdf
is modified in place.
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
gform_basic
Threshold Intervention
Description
This function implements a threshold intervention (i.e., once treatment bypasses a certain
threshold, it remains at that threshold until end of follow-up) for the specified
intervention variable in the data table newdf
.
Usage
threshold(newdf, pool, intvar, intvals, time_name, t)
Arguments
newdf |
Data table containing the simulated data at time |
pool |
Data table containing the simulated data at times before |
intvar |
Character string specifying the name of the variable to be intervened on in each round of the simulation. |
intvals |
A list of length 2. The first entry is lower bound of the threshold, and the second entry is the upper bound. |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
Value
No value is returned. The data table newdf
is modified in place.
Examples
## Estimating the effect of threshold interventions on the mean of a binary
## end of follow-up outcome
id <- 'id_num'
time_name <- 'time'
covnames <- c('cov1', 'cov2', 'treat')
outcome_name <- 'outcome'
histories <- c(lagged, cumavg)
histvars <- list(c('treat', 'cov1', 'cov2'), c('cov1', 'cov2'))
covtypes <- c('binary', 'zero-inflated normal', 'normal')
covparams <- list(covmodels = c(cov1 ~ lag1_treat + lag1_cov1 + lag1_cov2 + cov3 +
time,
cov2 ~ lag1_treat + cov1 + lag1_cov1 + lag1_cov2 +
cov3 + time,
treat ~ lag1_treat + cumavg_cov1 +
cumavg_cov2 + cov3 + time))
ymodel <- outcome ~ treat + cov1 + cov2 + lag1_cov1 + lag1_cov2 + cov3
intervention1.treat <- list(static, rep(0, 7))
intervention2.treat <- list(threshold, 1, Inf)
int_descript <- c('Never treat', 'Threshold - lower bound 1')
nsimul <- 10000
ncores <- 2
gform_bin_eof <- gformula_binary_eof(obs_data = binary_eofdata, id = id,
time_name = time_name,
covnames = covnames,
outcome_name = outcome_name,
covtypes = covtypes,
covparams = covparams,
ymodel = ymodel,
intervention1.treat = intervention1.treat,
intervention2.treat = intervention2.treat,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c("cov3"), seed = 1234,
parallel = TRUE, nsamples = 5,
nsimul = nsimul, ncores = ncores)
gform_bin_eof
Variance-covariance method for objects of class "gformula"
Description
This function extracts the variance-covariance matrices of the parameters of the fitted models for the time-varying covariates, outcome, and competing event (if applicable).
Usage
## S3 method for class 'gformula'
vcov(object, ...)
Arguments
object |
Object of class "gformula". |
... |
Other arguments. |
Value
If bootdiag
was set to FALSE
in gformula
,
this function returns a list of the variance-covariance matrices of the
parameters of the fitted models to the observed data set. If bootstrapping
was used and bootdiag
was set to TRUE
in
gformula
, this function returns a list described as follows.
The first element (named 'Original sample') is a list of the
variance-covariance matrices of the parameters of the fitted models to the
observed data set. The kth element (named 'Bootstrap sample k-1') is a list
of the variance-covariance matrices of the parameters of the fitted models
corresponding to the k-1th bootstrap sample.
See Also
Examples
## Estimating the effect of static treatment strategies on risk of a
## failure event
id <- 'id'
time_points <- 7
time_name <- 't0'
covnames <- c('L1', 'L2', 'A')
outcome_name <- 'Y'
outcome_type <- 'survival'
covtypes <- c('binary', 'bounded normal', 'binary')
histories <- c(lagged, lagavg)
histvars <- list(c('A', 'L1', 'L2'), c('L1', 'L2'))
covparams <- list(covmodels = c(L1 ~ lag1_A + lag_cumavg1_L1 + lag_cumavg1_L2 +
L3 + t0,
L2 ~ lag1_A + L1 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0,
A ~ lag1_A + L1 + L2 + lag_cumavg1_L1 +
lag_cumavg1_L2 + L3 + t0))
ymodel <- Y ~ A + L1 + L2 + L3 + lag1_A + lag1_L1 + lag1_L2 + t0
intervention1.A <- list(static, rep(0, time_points))
intervention2.A <- list(static, rep(1, time_points))
int_descript <- c('Never treat', 'Always treat')
nsimul <- 10000
gform_basic <- gformula(obs_data = basicdata_nocomp, id = id,
time_points = time_points,
time_name = time_name, covnames = covnames,
outcome_name = outcome_name,
outcome_type = outcome_type, covtypes = covtypes,
covparams = covparams, ymodel = ymodel,
intervention1.A = intervention1.A,
intervention2.A = intervention2.A,
int_descript = int_descript,
histories = histories, histvars = histvars,
basecovs = c('L3'), nsimul = nsimul,
seed = 1234)
vcov(gform_basic)
Create Visit Sum Covariate
Description
This internal function assists in the implementation of a visit process by creating a covariate,
ts_visit
, that counts the number of visits in the past max_visits
time points. If this
number is greater than 0, then the individual has not missed more than the maximum number
of visits.
Usage
visit_sum(pool, histvars, time_name, t, id_name, max_visits)
Arguments
pool |
Data table containing all information prior to time |
histvars |
Vector of character strings specifying the names of the variables for which lagged cumulative averages are to be created. |
time_name |
Character string specifying the name of the time variable in |
t |
Integer specifying the current time index. |
id_name |
Character string specifying the name of the ID variable in |
max_visits |
A vector of one or more values denoting the maximum number of times a binary covariate representing a visit process may be missed before the individual is censored from the data (in the observed data) or a visit is forced (in the simulated data). Multiple values exist in the vector when the modeling of more than covariate is attached to a visit process. |
Value
No value is returned. The data table pool
is modified in place.