invivoPKfit
is an R package to automate fitting
pharmacokinetic/toxicokinetic models to measured concentration vs. time
toxicokinetic data.
This version of the package takes an “object-oriented” approach to this task.
This version of the package is heavily modeled after
ggplot2
, and aims to provide a “grammar of PK modeling” in
the same way that ggplot2
provides a “grammar of
graphics.”
Just as the basic unit of ggplot2
is a
ggplot
object, the basic unit of invivopkfit
is a pk
object.
Just as a ggplot2
object is essentially a data set with
various instructions about how to visualize that data set, a
pk
object is essentially a data set with various
instructions about how to fit PK models to that data set.
You provide a set of instructions for building a ggplot2
plot by adding geom
, stat
, scale
,
etc. commands to a ggplot2
object using the +
command (like
ggplot(my_data, aes(x=x, y=y)) + geom_point()
). Similarly,
you provide a set of instructions for fitting an
invivopkfit
PK model by adding settings
,
stat
, scale
commands to a pk
object.
In ggplot2
, you can add the instructions in any order.
You can overwrite old instructions by adding new ones, if you change
your mind. The package will internally re-order the instructions in the
correct order to build the plot. invivopkfit
works exactly
the same way.
And just as ggplot2
doesn’t actually build the plot
until you issue a print
command, invivoPKfit
doesn’t actually fit the model until you issue a fit
command.
Here’s a basic example to illustrate the steps of the
invivopkfit
workflow.
The first requirement is that the concentration vs. time data to be
fit must be available in R in a data.frame
format. You may
import data from any source you like – external databases, Excel
spreadsheets, CSV files, etc. – so long as it can be put into a
data.frame
format where each row is a single concentration
vs. time observation.
invivoPKfit
contains a built-in data set from the
Concentration vs. Time Database (CvTdb) that is already in a compatible
data.frame
format: it is in the variable cvt
that is loaded with the package. We will use this built-in data for our
example.
For the sake of keeping run time reasonable, this example will
include data on only two of the chemicals available in the full
cvt
dataset.
pk
objectNow, initialize a pk
object: the basic unit of an
invivoPKfit
analysis. This object contains the data to be
fit, along with a set of instructions for how to do the model fitting.
(Note that none of the instructions are actually carried out at this
stage: no fitting is done until you explicitly request it.)
When you call pk()
, a default set of instructions will
be supplied: how to divide the data into groups; how to pre-process the
data; what PK models to fit to the data; what data transformations to
apply; what error model to assume during data fitting; etc. More details
on the instructions and how to change them will be provided below.
For now, let’s accept the default instructions. It will be easier to understand the instructions themselves if you first understand the overall steps of the model fitting workflow.
The model fitting workflow has four steps.
The first is data pre-processing. Here, variable names are harmonized, and the data set is cleaned, filtered, and missing values are imputed when necessary.
Some common harmonized variable names that will be referenced in this document are:
This step adds the pre-processed data to the pk
object
in a new element named data
.
Next, some summary information is gathered about each group of the
data. These are things like the number of detects/non-detects, broken
out by the different routes and media present in the data. This step
also includes non-compartmental analysis of the data, to calculate
things like the empirical Cmax (maximum concentration), empirical tmax
(time of maximum concentration), and empirical AUC (area under the
concentration-time curve). Instructions for this “data info” step are
given in my_pk$settings_data_info
.
This step adds the data summary info to the pk
object in
a new element named data_info
.
Next, a “pre-fitting” step is performed. This step looks at each group of the data, along with the PK models to be fitted, and determines the following:
This step adds the parameter lower/upper bounds and starting guesses
to the pk
object.
Finally, the model fitting can be performed. Each group of the data is fitted separately. By default, three different PK models are fitted to each group: a “flat” model, a 1-compartment model, and a 2-compartment model.
This step adds the fitted model parameter values to the
pk
object. If multiple models were fitted, and/or if
multiple optimization algorithms (“methods”) were used to do the
fitting, then one set of fitted model parameter values is added for each
model and each optimization algorithm (“method”).
Note that model fitting can be sped up substantially by using
parallel processing. This is done simply by supplying argument
n_cores
to do_fit()
.
Time without parallel processing:
Time with parallel processing:
Note that you coan just skip straight to the last step. If you call
do_fit()
on a newly-initialized pk
object, all
of the preceding steps will be done automatically. The following code
will give you exactly the same results as going through each step
individually as above.
Once fitting has been completed, then you can use some familiar methods to extract information about the fitted model(s).
For example, you can extract the fitted model parameter values using
coef()
. The result will be a tibble, whose columns include
model parameters, and whose rows uniquely identify individual data
groups, each model fitted, and each optimization algorithm used for
fitting.
The coefficients themselves are included as a list column, to
accommodate the fact that the number and names of coefficients will vary
depending on the PK model, as well as on the error model. Coefficients
can be viewed for a particular data group, model, and method by using a
tidyverse
data-wrangling function like
slice()
.
For example, this code views the coefficients for the first row of the full coefficients table:
And this views the coefficients for row 5 (the same chemical, fitted with the 2-compartment model instead of the 1-compartment). Note that there are more coefficients, and they have different names, compared to the 1-comaprtment model.
You can extract the residuals using residuals()
(for the
observations in the original data) using predict()
. The
result is a tibble
, one row for each fitted model,
optimization algorithm, and timepoint-value pair given by
predict()
.
You can get the predicted values (for the time points, doses, and
routes in the original data) using predict()
. The result is
a tibble
, one for each fitted model; the variable
Conc_est
contains the predictions.
If instead you want to get predicted values for new data (new
timepoints, doses, or routes), you can do this by providing argument
newdata
, but you need to include the variables for the data
group!
You can plot the fit using plot()
, which returns a
ggplot2
plot object for each data group:
By default, the concentration axis is scaled the same way as the data
was scaled for fitting. You can specify whether to dose-normalize and/or
log-transform the plot y axis using argument
use_scale_conc
:
You can also get model-evaluation metrics such as log-likelihood:
AIC:
BIC:
Root mean squared error (RMSE):
R-squared for predicted vs. observed:
Average fold error:
Absolute average fold error:
You can get the pre-processed data frame itself:
And you can get summary information about the pre-processed data:
You can compute derived toxicokinetic statistics, such as total
clearance rate, half-life, steady-state plasma concentration, etc.,
using get_tkstats()
. (The result is a
data.frame
of TK stats, one for each model and optimization
algorithm, that was fitted.)
You can extract the best-fit “winning” model for each data group and each optimization method – determined by which one had the lowest AIC.
If you would like to filter any of the other post-fitting outputs so
that they include only the winning model for each data group and
optimization method, you can do a left join of the
data.frame
of winning models with any of the other
post-fitting outputs.
For example, here is R-squared for the winning model only:
If you want to plot only the winning model, you can do so by
specifying argument best_fit = TRUE
:
pk
objectLet’s start at the very beginning (a very good place to start). When
you initialize a pk
object, what does that object actually
contain?
Here, I’ll initialize a pk
object with all the default
options.
This newly-initialized pk
object is, under the hood, a
list
object with the following elements:
The data_original
element contains the data set you
provided in argument data
. This is the data set that will
be fitted. In this case, it is a subset of CvTdb.
The mapping
element specifies how the variable names in
the original data should be translated to the harmonized/standardized
variable names that are used internally to invivopkfit
.
This is analogous to providing an aes
mapping in
ggplot2
– and in fact, it uses the same aes()
function as ggplot2
!
In ggplot2
, you would specify which variables in your
data map to the standardized “aesthetic” variables x
,
y
, color
, shape
, etc., using a
command like this:
ggplot(data = my_data,
aes(x = time_hr,
y = conc,
color = dose_level_normalized_corrected,
shape = as.factor(document_id)
)
)
In invivopkfit
, you specify which variables in your data
map to the standardized variables Chemical
,
Species
, Time
, Dose
, etc., using
a command like this:
pk(data = my_data,
mapping = ggplot2::aes(
mapping = ggplot2::aes(
Chemical = analyte_dtxsid,
Chemical_Name = analyte_name_original,
DTXSID = analyte_dtxsid,
CASRN = analyte_casrn,
Species = species,
Reference = fk_extraction_document_id,
Media = conc_medium_normalized,
Route = administration_route_normalized,
Dose = invivPK_dose_level,
Dose.Units = "mg/kg",
Subject_ID = fk_subject_id,
Series_ID = fk_series_id,
Study_ID = fk_study_id,
ConcTime_ID = conc_time_id,
N_Subjects = n_subjects_normalized,
Weight = weight_kg,
Weight.Units = "kg",
Time = time_hr,
Time.Units = "hours",
Value = invivPK_conc,
Value.Units = "mg/L",
Value_SD = invivPK_conc_sd,
LOQ = invivPK_loq
))
The above command shows the default mapping that will be applied if
you don’t specify a mapping
argument when you call
pk
. It’s a series of “standardized = original” pairs, where
the standardized invivoPKfit
internal variable name is on
the left-hand side, and the original data variable name is on the
right-hand side. Here, the original data variable names are all from
CvTdb. These are variables appearing in the invivoPKfit
built-in data object cvt
, which contains a subset of CvTdb
data.
At minimum, you will need to specify mappings for
Chemical
, Species
, Reference
,
Media
, Route
, Dose
,
Time
, Value
, and LOQ
. Most of
these names are self-explanatory. Media
refers to the
biological media in which concentration is measured (e.g. blood, plasma,
liver tissue). Route
refers to the route of administration
(e.g. intravenous or oral). Time
refers to the time point
at which concentration observation was made. Value
refers
to the measured concentration. LOQ
refers to the limit of
quantification of the measured concentration.
aes()
mapping specificationJust as in ggplot2
, you can specify expressions inside
aes
. This lets you specify mappings that are more
complicated than just a simple one-to-one variable name change.
For example, the mapping for Reference
uses an
expression:
Reference = as.character(
ifelse(
is.na(
documents_reference.id
),
documents_extraction.id,
documents_reference.id
)
)
This expression says “For the standardized variable
Reference
: First look at the original variable
documents_reference.id
. If that variable is NA, then use
the variable documents_extraction.id
; otherwise, use
documents_reference.id
.”
This mapping for Reference
occurs because of the way
CvTdb handles references. In CvTdb, concentration-time data are
extracted from a particular document. For example, this may be the PDF
of a peer-reviewed publication. The unique ID for this document is the
“extraction ID.” However, the original reference for the data may be a
different document; this is the “reference ID.” For example,
this occurred when Wambaugh et al. 2018 (doi: 10.1093/toxsci/kfy020)
collected and published a data set that had originally been published in
Cruz et al. 2002 (PMID: 12434508). The “reference ID” refers to Cruz et
al. (2002); the “extraction ID” refers to Wambaugh et al. (2018). If the
data are instead extracted from their original reference, then
“reference ID” is left blank (NA) and only “extraction ID” is used.
You can use expressions to specify constant values for standardized variables, as well. The default mapping contains items like this:
This mapping says “For the standardized variable
Value.Units
: Don’t take the value from any variable in the
original data. Just assign the constant value "mg/L"
to this
variable.”
my_pk$status
is an integer code representing the current
status of the pk
object in the overall pk
workflow. The workflow steps are numbered like this:
pk
object in pk()
do_preprocess()
do_data_info()
do_prefit()
You can get the status of a pk
object at any time
using
The data may include the results of multiple experiments for multiple chemicals, species, routes of administration, biological media, etc. You control how to split up the data into batches for fitting. Do you want to fit a PK model to all data for a given chemical and species, even if that includes more than one experiment? Do you want to fit separate PK models to each experiment? It’s your choice.
Data grouping instructions are given using a function called
facet_data()
(named by analogy with ggplot2
).
You specify data grouping using a call to dplyr::vars()
.
You specify data grouping using the names of the new, standardized
variables that you created using the mapping
argument to
pk()
. Whatever variables you specify, the data will be
grouped by unique combinations of those variables.
The default data grouping is the result of calling the function
facet_data()
with its default arguments. In other
words,
is the same as
This specification is the default grouping: if you just call
pk()
without adding a + facet_data()
specification, the data will be grouped by unique combinations of
Chemical and Species.
my_pk$settings_preprocess
contains the instructions for
pre-processing the data. Its default value is the result of calling the
function settings_preprocess()
with its default arguments.
In other words,
is the same as
Here are the default arguments to
settings_preprocess()
.
Explanations of each argument follow, to show how you can control the data pre-processing settings.
routes_keep
This contains a list of the “allowed” routes of administration. Data
will be filtered to keep only observations with routes on this list. The
default is c('oral', 'iv')
. Currently,
invivopkfit
only has PK models implemented for those two
routes. So it is not recommended to change the default unless you know
what you are doing.
media_keep
This contains a list of the “allowed” tissue media in which
concentrations can be measured/predicted. Data will be filtered to keep
only observations with media on this list.The default is
c('blood', 'plasma')
. Currently, invivoPKfit
only has PK models implemented for those two media. So it is not
recommended to change the default unless you know what you are
doing.
ratio_conc_dose
This argument gives the ratio of the mass units used for
concentrations to the mass units used for dose. The default is 1,
indicating the same mass units are used in each (in CvTdb, by default,
concentrations are mg/L and doses are mg/kg). If, for example, you had
data where concentrations were ng/L and doses were mg/kg, you would need
to use ratio_conc_dose = 1e-6
to indicate the ratio between
nanograms and milligrams. During data pre-processing, concentrations are
multiplied by ratio_conc_dose
, so that concentration units
harmonize with dose units.
impute_loq
, loq_group
and
calc_loq_factor
These arguments instruct invivopkfit
on whether and how
to try to impute values for limits of quantification, if LOQ values are
missing for any observations. If impute_loq = TRUE
, then
missing LOQs will be imputed as follows. The data will be split into
groups according to unique combinations of the variables specified in
loq_group
, which by default are Chemical, Species, Route,
Medium, Reference. Then, the minimum detected value will be multiplied
by calc_loq_factor
, which is 0.9 by default. The result
will be imputed for any missing LOQs in that loq_group
.
If, after LOQ imputation, any observations remain where both Value (observed concentration) and LOQ are NA, then they will be marked for exclusion from further analysis.
impute_sd
, sd_group
This instructs invivopkfit
on whether and how to try to
impute missing values for sample standard deviations of observed
concentrations. If impute_sd = TRUE
, then missing SDs will
be imputed as follows. The data will be split into groups according to
unique combinations of the variables specified in sd_group
.
If there are any non-missing SDs in a group, then missing SDs will be
imputed as the minimum non-missing SD in the group. If all SDs in a
group are missing, then they will be imputed with 0.
my_pk$settings_data_info
provides control settings for
getting summary data information, including non-compartmental analysis.
Its default value is the result of calling the function
settings_data_info()
with its default arguments.
In other words,
is the same as
Here are the default arguments to
settings_data_info()
:
Element summary_group
instructs invivopkfit
on how to group data for calculation of summary statistics, including
non-compartmental analysis (NCA). (Later, summary_group
is
also used to instruct invivopkfit
how to group dataa to
calculate summary TK statistics predicted by the models).
Data will be split into groups according to unique combinations of
the variables specified in summary_group
. Then,
non-compartmental analysis will be performed for each group using
PK::nca()
.
For each summary group, a few basic summary statistics will also be computed:
my_pk$optimx_settings
provides control settings for the
optimizer used to fit the model(s) to the data, which is
optimx::optimx()
. It is the result of a call to
settings_optimx()
with default arguments. In other words,
just calling pk()
by itself is the same as
Here are the default arguments to settings_optimx()
:
These are simply the arguments for optimx::optimx()
:
method
, itnmax
, hessian
,
control
, and ...
.
In general, you’ll need to choose a value for method
that names one of the optimization algorithms that can handle parameters
bounds: either "bobyqa"
or "L-BFGS-B"
.
my_pk$scales
is itself a list with elements
conc
and time
, providing instructions on how
to transform concentration and time variables before fitting any
models.
my_pk$scales$conc
is a result of a call to
scale_conc()
with the default arguments. In other words,
pk()
alone is the same as
my_pk <- pk(my_data) +
settings_preprocess() +
settings_data_info() +
settings_optimx() +
scale_conc()
Here are the default arguments to scale_conc()
:
dose_norm
This instructs invivopkfit
on whether to apply dose
normalization before fitting – i.e., divide each observed concentration
(and its corresponding observed standard deviation and/or LOQ, if any)
by its corresponding dose. Dose normalization may be useful to normalize
the residual error if the residual error is heteroscedastic and scales
with concentration (because higher dose usually means higher
concentration).
log10_trans
This instructs invivopkfit
whether to apply a
log10()
transformation to observed concentrations (and
LOQs) before fitting. This transformation might also be useful to
normalize heteroscedastic residual errors that scale multiplicatively
with concentration.
my_pk$scales$time
The time
element is a result of a call to
scale_time()
with the default arguments (there is only
one). Calling pk()
by itself is the same as
my_pk <- pk(my_data) +
settings_preprocess() +
settings_data_info() +
settings_optimx() +
scale_conc() +
scale_time()
Here are the default arguments to scale_time()
:
new_units
instructs invivokpfit
whether and
how to convert observed times into different units. The default
new_units = "identity"
tells invivopkfit
not
to apply any transformations to the time units – retain the original
units. Another useful option is new_units = "auto"
, which
tells invivopkfit
to choose new units based on the midpoint
of observed times – it will automatically choose time units that put the
midpoint time on the order of 10, or as close as it can get to that. You
can also specify any time units understood by
lubridate::duration()
, e.g. minutes
,
hours
, days
, weeks
,
months
, years
, to convert time into those
units.
my_pk$stat_model
is a named list, with one item named
for each model to be fitted. It is the result of a call to
stat_model()
with its default arguments. In other words,
calling pk()
by itself is the same as calling
my_pk <- pk(my_data) +
settings_preprocess() +
settings_data_info() +
settings_optimx() +
scale_conc() +
scale_time() +
stat_model()
Here are the default arguments to stat_model
:
The model
argument is a character vector (or scalar),
listing the names of one or more models to fit to the data. The default
value lists the three models that are already implemented and built in
to invivopkfit
.
my_pk$stat_error_model
lists the error model to apply
during fitting. This is the result of a call to
stat_error_model()
with its default arguments.
In other words, calling pk()
by itself is the same as
calling
my_pk <-pk(my_data) +
settings_preprocess() +
settings_data_info() +
settings_optimx() +
scale_conc() +
scale_time() +
stat_model() +
stat_error_model()
Here are the default arguments to
stat_error_model()
:
There is only one default argument: error_group
. It is
specified, just like data grouping in facet_data()
, using a
call to dplyr::vars()
.
This specifies the error model as, essentially, a fixed-effects
model. Within each data group, the data are further split into subgroups
according to unique combinations of the variables specified in
error_group
. Residual errors within each error subgroup are
assumed to be independent and identically distributed according to a
zero-mean normal distribution with a standard deviation that is unique
to that error subgroup.
The error standard deviations for each error subgroup are
hyperparameters of the model. They will be estimated from the data
simultaneously with the model parameters for each data group. The
argument error_group
defines the number of different error
subhgroups, and therefore the numnber of error SD hyperparameters that
need to be estimated.
If you specify error_group
to be the same as the data
group, then that simply means there is only one error group within each
data group, and therefore only one error standard deviation for that
data group.
Here are three examples:
This type of analysis fits each Chemical and Species data group, but sets error group to unique combinations of Chemical, Species, and Reference. The residual errors from each Reference will be assumed to obey distributions with different standard deviations during fitting, modeling the assumption that each individual experiment for a given chemical and species measures the same underlying kinetics, but may measure them with more or less uncertainty.
This type of analysis fits each Chemical and Species data group, and sets error group to be identical to data group. All residual errors within each Chemical-Species data group will be assumed to obey a single distribution with just one standard deviation, modeling the assumption that each individual experiment for a given chemical and species measures the same underlying kinetics, and that all individual experiments measure the underlying kinetics with the same amount of uncertainty.
This type of analysis fits each unique combination of Chemical, Species, and Reference, and sets error group to be identical to data group. In other words, each individual experiment measures different underlying kinetics.
pk
objectIf you just want to use the default fitting instructions, you can simply do
But if you would like to specify different instructions, you do that
by adding settings, scales, and stats using +
, similarly to
the way you add layers in ggplot2
.
It doesn’t matter what order you add the instructions in.
Here is an example. Notice that the instructions area added in a
different order than shown in the “anatomy of a pk
object`.
That’s to emphasize that it doesn’t matter what order you add the
instructions. Any instructions you don’t add explicitly will take their
default values.
Note that it will throw messages about replacing existing instructions. These are just messages for your information – they do not mean anything is wrong!
my_pk <- pk(my_data) +
#instructions to use an error model that puts all observations in the same group
stat_error_model(error_group = ggplot2::vars(Chemical, Species)) +
#instructions for concentration scaling/transformation
scale_conc(dose_norm = TRUE,
log10_trans = TRUE) +
#instructions for time rescaling
scale_time(new_units = "auto") +
#instructions to use only one method for optimx::optimx()
settings_optimx(method = "L-BFGS-B") +
#instructions to impute missing LOQs slightly differently
settings_preprocess(calc_loq_factor = 0.5)
pk
objectYou can check the current instructions for a pk
object
using various functions that are named get_[element]
. This
is (hopefully) easier than remembering how to access all the list
elements in the pk
object.
You can then overwrite any instructions by adding new/different ones. Let’s say you made a mistake – you actually wanted to dose-normalize concentrations, but not log-transform them.
You can change the concentration-scaling instructions by simply adding new ones:
Now the new instructions have replaced the old ones.
And let’s say you only want to fit the one-compartment model – you don’t want the default behavior, which fits the flat and two-compartment models as well as the one-compartment model. You can do this:
Now, “model_flat” and “model_2comp” models have been removed from the list of models to fit.
pk
object as you go through the steps
of model fitting workflowThe only items in the pk
object are those that contain
the original data and instructions.
Re-check the names of the pk
object:
Notice that a new element data
has been added to the
end. This contains the pre-processed data. You can access it using the
function get_data()
.
Notice that the variables Value
and LOQ
have been translated into Conc
and Detect
. If
Value < LOQ
then Detect = FALSE
; otherwise
Detect = TRUE
. If Detect = FALSE
, then
Conc = LOQ
; otherwise, Conc = Value
. This is
just a different way of presenting the same information, but it’s a bit
easier to handle with some of the internal calculations of
invivoPKfit
.
Also, notice that the requested time and concentration
transformations have been applied – compare Time
and
Time.Units
vs. Time_trans
and
Time_trans.Units
; compare Conc
and
Conc.Units
vs. Conc_trans
and
Conc_trans.Units
.
Notice that missing LOQ
values have been imputed;
compare LOQ.orig
to LOQ
. Similarly, missing
Value_SD
values have been imputed; compare
Value_SD.orig
and Value_SD
. Note that the
standard deviations are put in a new variable Conc_SD
to
match with the Conc
variable. Also, note that the
concentration transformations have been applied to the
Conc_SD
variable as well – see Conc_SD_trans
.
That way, the transformation concentration standard deviations are in
the same units as the transformed concentrations.
(A warning: If you do scale_conc(log10_trans = TRUE)
,
then the log10 transformation will be applied to the concentration SD as
well as the concentration. This means you can no longer take
Conc_trans + Conc_SD
or Conc_trans - Conc_SD
as upper and lower bounds. This math is handled correctly internal to
invivopkfit
, but if you extract the pre-processed data and
use it for another purpose, be aware of that caveat!)
A new element data_info
has been added. This element is
itself a named list containing elements data_summary
,
data_flags
, dose_norm_check
, and
nca
. You can extract these elements using the following
functions:
An new list element prefit
has been added. It can be
extracted using the function get_prefit()
. It is itself a
named list, with one element for every PK model to be fitted to the
data.
Element prefit$stat_error_model
has one element
sigma_DF
, which gives the starting values and lower/upper
bounds for the error-group-specific standard deviations.
Element prefit$par_DF
gives the starting values and
lower/upper bounds for each parameter of each model to be fit.
Element fit_check
gives information about whether each
fit should be attempted (i.e., whether there are enough
observations).
The pk
object now contains a new element
fit
. This is a tibble
containing information
about the fit for each data group, each PK model, and each optimization
method. This table includes most of the outputs of
optimx::optimx()
, including information about convergence.
It lists each parameter for each model (as in
my_pk$prefit$parDF
), including error-group specific sigmas
(as in my_pk$prefit$stat_error_model$sigmaDF
), and gives
the final maximum-likelihood estimate for that parameter. Additionally,
it provides information about whether each parameter estimate was at a
lower or upper bound.
Let’s say you go through all the steps of the workflow for a
pk
object, all the way to fitting. And then, you change one
of the instructions. For example, you change the concentration scaling,
which will change the results of everything from pre-processing the data
(step 2) onwards. Now, the fit results stored in the fitted
pk
object don’t match the concentration scaling stored in
the same object. You need to re-do all steps of the workflow.
However, if you had changed the error model instead, you wouldn’t need to re-do all steps. You’d only need to re-do the pre-fitting and fitting steps. The error model doesn’t affect data pre-processing or data summary information steps.
How can you keep track of which steps you need to re-do after changing one or more of the instructions?
The answer is that invivopkfit
will automatically check
the object status every time you issue a new instruction, and will
re-set it back to the point where you need to re-do the steps of the
workflow.
Let’s say I’ve created a pk
object. I don’t apply any
scalings/transformations to concentration or time.
The status of the newly-created pk
object is 1.
my_pk <- pk(data = my_data) + #initialize a `pk` object
stat_model(model = c("model_flat",
"model_1comp",
"model_2comp")) + #add PK models to fit
settings_optimx(method = "L-BFGS-B") #use only this optimx::optimx() algorithm
get_status(my_pk) #status is 1
Now I do all the steps up to step 5. (If you just call
do_fit()
, it will fast-forward through all the steps.)
I can now extract the AIC for my fitted models.
But now I realize that I should have dose-normalized the
concentration. I do this by adding a scale_conc()
.
invivoPKfit
throws a warning to tell me that the status
will be reset back to 1.
And in fact, the status is now reset to 1.
This means that I can’t do anything that requires a fully fitted model – not until I re-do all the workflow steps.
In effect, invivopkfit
throws out any fit results that
were made before I changed the concentration scaling.
But after I re-fit the new model (setting the status back to 5), I can now extract the new AIC values.
If I wanted to keep my original fit results, and also create a new
pk
object that was the same except for the concentration
scaling, then I should copy my pk
object into a new
variable and modify the new one.
#fit a pk object
my_pk <- pk(data = my_data) +
settings_preprocess(suppress.messages = TRUE)
my_pk <- do_fit(my_pk)
get_status(my_pk) #status is now 5
#copy it to a new variable
my_pk_new <- my_pk
#and modify scale_conc() on the new copy
my_pk_new <- my_pk + scale_conc(dose_norm = TRUE)
get_status(my_pk_new) #status has been reset to 1 for the new copy
get_status(my_pk) #but status of the original is still 5
Compare fits made using different data transformations.
#suppress messages
my_pk <- my_pk + settings_preprocess(suppress.messages = TRUE)
#dose normalization, no log transformation
pk1 <- my_pk + scale_conc(dose_norm = TRUE, log10_trans = FALSE)
#log transformation, no dose normalization
pk2 <- my_pk + scale_conc(dose_norm = FALSE, log10_trans = TRUE)
#both normalization and log transformation
pk3 <- my_pk + scale_conc(dose_norm = TRUE, log10_trans = TRUE)
#do fits
pk1 <- do_fit(pk1, n_cores = parallel::detectCores()-1)
pk2 <- do_fit(pk2, n_cores = parallel::detectCores()-1)
pk3 <- do_fit(pk3, n_cores = parallel::detectCores()-1)
Plot the results.
Note that by default, each one has been plotted using its own concentration transformation. We can plot them all in the same way to compare better: dose-normalize the concentration axis, but do not log-transform it.
pk_hier <- my_pk +
stat_error_model(error_group = vars(Chemical, Species, Reference))
pk_pool <- my_pk +
stat_error_model(error_group = vars(Chemical, Species))
pk_hier <- do_fit(pk_hier, n_cores = parallel::detectCores()-1)
pk_pool <- do_fit(pk_pool, n_cores = parallel::detectCores()-1)
Many more and detailed examples can be seen in the R Markdown document that gives the code to produce the figures and tables for the manuscript.
Each model is an object of class pk_model
, which is
simply a named list giving:
name
: The model nameparams
: The model parameter names (listed as a
character vector)conc_fun
: The name of a function that computes
concentrations, given model parameters, time, dose, route, and
mediaauc_fun
: The name of a function that computes AUC (area
under the concentration time curve), given model parameters, time, dose,
route, and mediaparams_fun
” The name of a function that returns bounds
and starting points for each of the model parameters, given the
datatkstats_fun
: The name of a function that computes
derived TK statistics (such as halflife, clearance rate, etc.) given
model parametersconc_fun_args
: Any additional arguments to the function
in conc_fun
auc_fun_args
: Any additional arguments to the function
in auc_fun
params_fun_args
: Any additional arguments to the
function in params_fun
tkstats_fun_args
: Any additional arguments to the
function in tkstats_fun
For example, here is the list for the built-in model
model_1comp
:
And here is the list for the built-in model
model_2comp
:
You can define a new model by first writing each of the required functions above for that model:
conc_fun
that calculates
concentration as a function of dose, time, and model parametersauc_fun
that calculates area under the
concentration-time curve as a function of dose, time, and model
parametersparams_fun
that takes in the data
and returns a data.frame
whose rows give the names of the
model parameters, whether to estimate them from the data, a starting
guess for each parameter, and lower and upper bounds for each
parametertkstats_fun
to calculate summary TK
statistics from the fitted model coefficients.Source the R scripts containing those functions, so that R knows those functions by name.
Then use the command pk_model()
, whose named arguments
correspond to the items in the named list above. Supply
pk_model()
with values for each argument giving the names
of the new functions you have written. Assign the result to an R
variable, which will contain your new pk_model
object.
That new model object will persist for the duration of your R session
(unless you remove it), and will need to be re-defined if you restart R.
You can save your new model object by, for example, calling
saveRDS()
to save it as a .Rds file in a location of your
choice, and then load it by calling loadRDS()
.
Here are the requirements for each of those functions (which can also
be found in help(pk_model)
):
conc_fun
requirementsconc_fun
should be a function that takes the following
named arguments, and returns a numeric vector of predicted tissue
concentrations:
params
: A named list of parameter valuestime
: A numeric vector of time valuesdose
: A numeric vector of dose values. Currently, only
a single bolus dose at time 0 is supported.route
: A character vector of the route of
administration. Currently, only 'oral'
and
'iv'
are supported.medium
: The tissue in which concentration is to be
predicted. Currently, only 'blood'
and
'plasma'
are supported.See cp_1comp()
, cp_2comp()
,
cp_flat()
for examples.
auc_fun
requirementsauc_fun
should be a function that takes the same
arguments as conc_fun
, and returns a numeric vector of
predicted tissue AUCs (area under the concentration-time curve).
See auc_1comp()
, auc_2comp()
,
auc_flat()
for examples.
params_fun
requirementsparams_fun
should be a function whose first argument is
a data.frame
, which will be the pre-processed data using
invivopkfit
harmonized variable names. It may take
additional arguments, which can be provided in
params_fun_args
. The function should return a
data.frame
with the following variables:
param_name
: Character vector, listing parameter names
for the modelparam_units
: Character vector, listing units of each
model parameteroptimize_param
: Logical (TRUE/FALSE), whether each
parameter is to be estimated given the available datause_param
: Logical (TRUE/FALSE), whether each parameter
is to be used in the model even if it is not estimated (i.e., if a
parameter value is to be held constant while the others are estimated,
then optimize_param
should be FALSE but
use_param
should be TRUE) -lower_bound
:
Numerical. Lower bounds for each parameter. May be -Inf
if
no lower bound. If optimize_param
or use_param
is FALSE, thenthe corresponding lower_bound
will be ignored
(because the parameter is not being estimated from the data).upper_bound
: Numerical. Upper bounds for each
parameter. May be Inf
if no upper bound. If
optimize_param
or use_param
is FALSE, then the
corresponding upper_bound
will be ignored (because the
parameter is not being estimated from the data).start
: Numerical. Starting values for estimating each
parameter. If optimize_param
is FALSE and
use_param
is TRUE, then the parameter will be held constant
at the corresponding value in start
. If
use_param
is FALSE, then the corresponding
start
will be ignored.See get_params_flat()
, get_params_1comp()
,
get_params_2comp()
for examples.
tkstats_fun
requirementstkstats_fun
should be a function which accepts a vector
of model parameter values and calculates derived summary toxicokinetic
statistics (e.g. total clearance, halflife, AUC, volume of distribution
at steadystate).
The function must take the following named arguments:
pars
: A named numeric vector of model parameter
values.route
: A character scalar naming a route (e.g. “oral”
or “iv”)medium
: A character scalar naming a tissue medium of
analysis (e.g. “blood” or “plasma”)dose
: A numeric scalar giving a dose level for which to
calculate TK statisticstime_unit
: A character scalar giving the units of
timeconc_unit
: A character scalar giving the units of
concentrationvol_unit
: A character scalar giving the units of
volumeThe function should return a data.frame
of derived
toxicokinetic statistics, which should include the following named
variables:
param_name
: A character vector giving the names of each
derived TK statisticparam_value
: A character vector giving the values of
each derived TK statisticparam_units
: A character vector giving the units of
each derived TK statisticIt is recommended (although not required) that the function return at
least the following statistics, using these names in the
param_name
variable:
CLtot
: Total clearance rate (units of volume/time)CLtot/Fgutabs
: Total clearance rate scaled by gut
absorption fraction (if gut absorption fraction is available) (units of
volume/time)Css
: The steady-state concentration after a long-term
daily dose of dose
(units of concentration)halflife
: The terminal half-life (units of time)tmax
: The time of peak concentration (units of
time)Cmax
: The peak concentration (units of time)AUC_infinity
: The area under the concentration-time
curve, calculated at infinite time (units of concentration * time)Vdist_ss
: The volume of distribution at steady-state
(units of volume)Vdist_ss/Fgutabs
: The volume of distribution at
steady-state scaled by gut absorption fraction (if gut absorption
fraction is available) (units of volume)The recommendation to return these statistics, using these names, is
intended to make it easier to compare TK statistics across models, and
to compare TK statistics to the results of non-compartmental analysis.
If these names are not used, then some outputs of
eval_tkstats()
will not be very useful. The automated
comparison of TK stats from the winning model to the results of
non-compartmental analysis relies on these names being present in the
output of tkstats_fun
to match the names of the statistics
output from NCA; it shouldn’t crash without them, but the results won’t
be very useful. And TK stats compiled across models will not be easy to
compare if the models use different names for the statistics.
For examples, see tkstats_1comp()
,
tkstats_2comp()
.
Currently, some functions in invivopkfit
may assume
particular units for the data, and therefore particular units for the
fitted model parameters. For example, time is generally assumed to be in
units of hours. We plan to relax these assumptions in future releases of
invivoPKfit
.