fastml: Guarded Resampling Workflows for Safe and Automated Machine Learning in R

fastml is an R package for training, evaluating, and comparing machine learning models with a guarded resampling workflow.
Rather than introducing new learning algorithms, fastml focuses on reducing leakage risk by keeping preprocessing, model fitting, and evaluation aligned within supported resampling paths.

In fastml, “fast” refers to the rapid construction of statistically valid workflows, not to computational shortcuts. By eliminating entire classes of user-induced errors—most notably preprocessing leakage—fastml allows practitioners to obtain reliable performance estimates with minimal configuration.

Core Principles

Features

Installation

From CRAN

You can install the latest stable version of fastml from CRAN using:

install.packages("fastml")

You can install all dependencies (additional models) using:

# install all dependencies - recommended
install.packages("fastml", dependencies = TRUE)

From GitHub

For the development version, install directly from GitHub using the devtools package:

# Install devtools if you haven't already
install.packages("devtools")

# Install fastml from GitHub
devtools::install_github("selcukorkmaz/fastml")

Quick Start

Here’s a simple workflow to get you started with fastml:

library(fastml)

# Example dataset
data(iris)

iris_binary <- iris %>%
  filter(Species != "setosa") %>%
  mutate(Species = factor(Species))

# Train models
fit <- fastml(
  data = iris_binary,
  label = "Species",
  algorithms = c("rand_forest", "logistic_reg")
)

# View model summary
summary(fit)

# Plot the performance metrics
plot(fit, type = "bar")

# Plot ROC curves
plot(fit, type = "roc")

# Plot model calibration
plot(fit, type = "calibration")

Tuning Strategies

Hyperparameter tuning is supported via:

fastml(
  data = iris_binary,
  label = "Species",
  algorithms = c("rand_forest", "logistic_reg"),
  tuning_strategy = "bayes",
  tuning_iterations = 20
)

tuning_iterations is used only for Bayesian optimization.

Explainability

Model explainability tools are provided through fastexplain():

# Prepare data
library(survival)
data(pbc, package = "survival")
  
# The pbc dataset has two parts; we only want the baseline data (rows 1-312)
pbc_baseline <- pbc[1:312, -c(1:4)]

# Train a regression model
fit_reg <- fastml(
  data = pbc_baseline,
  label = "albumin",
  algorithms = c("xgboost"),
  metric = "rmse",
  impute_method = "medianImpute"
)
  
# Feature importance and SHAP values based on DALEX
fastexplain(fit_reg, method = "dalex")

# Breakdown profile
fastexplain(fit_reg, method = "breakdown", observation = pbc_baseline[1, -9])

# Counterfactual explanation (Ceteris Paribus profile)
fastexplain(fit_reg, method = "counterfactual", observation = pbc_baseline[1, -9])

Explainability is performed on trained models and does not interfere with resampling or preprocessing.

Exploratory Diagnostics

fastexplore() provides read-only exploratory diagnostics prior to model training. It summarizes distributions, missingness, correlations, and basic structure without invoking resampling, preprocessing, or model fitting.

fastexplore(iris, label = "Species")

This function is decoupled from fastml’s guarded resampling core and does not influence model evaluation unless its outputs are explicitly used in later modeling calls.

Scope

fastml is intended for users who require reliable performance estimation under cross-validation, particularly in:

It prioritizes correctness-oriented defaults and workflow clarity over maximum flexibility.

License

MIT License See LICENSE for details.