
fastml is an R package for training, evaluating, and
comparing machine learning models with a guarded resampling
workflow.
Rather than introducing new learning algorithms, fastml focuses on
reducing leakage risk by keeping preprocessing, model
fitting, and evaluation aligned within supported resampling paths.
In fastml, “fast” refers to the rapid construction of statistically valid workflows, not to computational shortcuts. By eliminating entire classes of user-induced errors—most notably preprocessing leakage—fastml allows practitioners to obtain reliable performance estimates with minimal configuration.
Guarded resampling workflow
When the guarded resampling path is used, preprocessing and model
fitting are re-estimated independently within each resampling split.
This reduces leakage risk, but does not prevent users from supplying
preprocessed inputs.
Leakage risk reduction, not guarantees
fastml can mitigate common leakage modes (e.g., global scaling or
imputation before resampling) when workflows are fit within resamples.
It does not universally prevent all leakage scenarios.
Single, unified interface
Multiple models can be trained and benchmarked through a single call,
while internally coordinating resampling, preprocessing, and evaluation
in supported paths.
Compatibility with established engines
fastml orchestrates existing modeling infrastructure (recipes, rsample,
parsnip, yardstick) without modifying their statistical
behavior.
Preprocessing isolation within resampling
Transformations (scaling, imputation, encoding, feature construction)
are learned from training folds and applied to assessment folds when the
guarded resampling path is used.
Support for multiple algorithms
Includes tree-based models, linear and penalized models, kernel methods,
neural networks, and boosting approaches via established
engines.
Hyperparameter tuning within guarded
resampling
Grid and Bayesian tuning are performed safely inside the resampling
loop.
Consistent performance evaluation
Metrics such as Accuracy, ROC AUC, Sensitivity, Specificity, Precision,
and F1 are computed without leakage.
Visualization and comparison tools
Built-in plots facilitate comparison across models while preserving
statistical validity.
You can install the latest stable version of fastml from CRAN using:
install.packages("fastml")You can install all dependencies (additional models) using:
# install all dependencies - recommended
install.packages("fastml", dependencies = TRUE)For the development version, install directly from GitHub using the devtools package:
# Install devtools if you haven't already
install.packages("devtools")
# Install fastml from GitHub
devtools::install_github("selcukorkmaz/fastml")Here’s a simple workflow to get you started with fastml:
library(fastml)
# Example dataset
data(iris)
iris_binary <- iris %>%
filter(Species != "setosa") %>%
mutate(Species = factor(Species))
# Train models
fit <- fastml(
data = iris_binary,
label = "Species",
algorithms = c("rand_forest", "logistic_reg")
)
# View model summary
summary(fit)
# Plot the performance metrics
plot(fit, type = "bar")
# Plot ROC curves
plot(fit, type = "roc")
# Plot model calibration
plot(fit, type = "calibration")Hyperparameter tuning is supported via:
grid — regular grid search
bayes — Bayesian optimization
fastml(
data = iris_binary,
label = "Species",
algorithms = c("rand_forest", "logistic_reg"),
tuning_strategy = "bayes",
tuning_iterations = 20
)tuning_iterations is used only for Bayesian
optimization.
Model explainability tools are provided through
fastexplain():
# Prepare data
library(survival)
data(pbc, package = "survival")
# The pbc dataset has two parts; we only want the baseline data (rows 1-312)
pbc_baseline <- pbc[1:312, -c(1:4)]
# Train a regression model
fit_reg <- fastml(
data = pbc_baseline,
label = "albumin",
algorithms = c("xgboost"),
metric = "rmse",
impute_method = "medianImpute"
)
# Feature importance and SHAP values based on DALEX
fastexplain(fit_reg, method = "dalex")
# Breakdown profile
fastexplain(fit_reg, method = "breakdown", observation = pbc_baseline[1, -9])
# Counterfactual explanation (Ceteris Paribus profile)
fastexplain(fit_reg, method = "counterfactual", observation = pbc_baseline[1, -9])Explainability is performed on trained models and does not interfere with resampling or preprocessing.
fastexplore() provides read-only exploratory diagnostics
prior to model training. It summarizes distributions, missingness,
correlations, and basic structure without invoking resampling,
preprocessing, or model fitting.
fastexplore(iris, label = "Species")This function is decoupled from fastml’s guarded resampling core and does not influence model evaluation unless its outputs are explicitly used in later modeling calls.
fastml is intended for users who require reliable performance estimation under cross-validation, particularly in:
multi-site or grouped data
high-dimensional biomedical applications
workflows prone to preprocessing leakage
It prioritizes correctness-oriented defaults and workflow clarity over maximum flexibility.
MIT License See LICENSE for details.