--- title: "Getting Started with ensembleML" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with ensembleML} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) library(ensembleML) ``` ## Overview `ensembleML` provides a **single, consistent API** for ensemble machine learning in R. Regardless of which algorithm you choose, the core workflow is always: ``` em_fit() -> em_predict() -> em_evaluate() ``` Advanced usage adds: ``` em_cv() # k-fold cross-validation (stability estimates) em_tune() # grid-search hyperparameter optimisation em_compare() # side-by-side algorithm comparison em_importance() # feature importance em_partial() # partial dependence plots em_confusion() # confusion matrix heatmap em_calibration() # calibration / reliability diagram em_residuals() # regression diagnostics ``` --- ## 1. Train a model ```{r fit} data(iris) set.seed(42) idx <- sample(nrow(iris), 120) train <- iris[idx, ] test <- iris[-idx, ] rf <- em_fit(Species ~ ., data = train, method = "random_forest", verbose = TRUE) ``` Switching algorithms requires changing a single argument: ```{r xgb, eval = FALSE} xgb <- em_fit(Species ~ ., data = train, method = "xgboost") ada <- em_fit(Species ~ ., data = train, method = "adaboost") bag <- em_fit(Species ~ ., data = train, method = "bagging") ``` --- ## 2. Predict ```{r predict} preds <- em_predict(rf, newdata = test) head(preds) ``` Class probabilities: ```{r prob} probs <- em_predict(rf, newdata = test, type = "prob") head(probs, 3) ``` --- ## 3. Evaluate ```{r evaluate} em_evaluate(rf, newdata = test) ``` Select specific metrics: ```{r metrics} em_evaluate(rf, newdata = test, metrics = c("accuracy", "f1", "kappa")) ``` --- ## 4. Cross-validation Use `em_cv()` to get mean +/- SD across folds before committing to a method: ```{r cv, eval = FALSE} cv_res <- em_cv(Species ~ ., data = iris, method = "random_forest", cv_folds = 5, repeats = 3) cv_res$summary em_plot_cv(cv_res, metric = "accuracy") ``` --- ## 5. Tune hyperparameters ```{r tune, eval = FALSE} grid <- list(ntree = c(100, 300, 500), mtry = c(1, 2, 3)) tuned <- em_tune( Species ~ ., data = train, method = "random_forest", param_grid = grid, cv_folds = 5 ) tuned$best_params tuned$best_score tuned$results ``` --- ## 6. Compare algorithms ```{r compare, eval = FALSE} cmp <- em_compare(Species ~ ., train = train, test = test) cmp$table ``` --- ## 7. Feature importance ```{r importance} em_importance(rf, top_n = 4) ``` --- ## 8. Partial dependence ```{r partial, eval = FALSE} em_partial(rf, data = train, feature = "Petal.Length") ``` --- ## 9. Confusion matrix ```{r confusion, eval = FALSE} em_confusion(rf, newdata = test) em_confusion(rf, newdata = test, normalise = TRUE) ``` --- ## 10. Regression example Everything works identically for numeric responses: ```{r regression} set.seed(7) reg_data <- data.frame( x1 = rnorm(200), x2 = rnorm(200), y = 3 + 2 * rnorm(200) + rnorm(200)) reg_train <- reg_data[1:160, ] reg_test <- reg_data[161:200, ] reg_model <- em_fit(y ~ ., data = reg_train, method = "random_forest") em_evaluate(reg_model, reg_test) em_residuals(reg_model, reg_test) ``` --- ## Citation If you use `ensembleML` in published work, please cite it: ```{r citation, eval = FALSE} citation("ensembleML") ``` The individual algorithms should also be cited — see `citation("ensembleML")` for the full list of references. --- ## Session info ```{r session} sessionInfo() ```