---
title: "Getting Started with ensembleML"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with ensembleML}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 7,
  fig.height = 4
)
library(ensembleML)
```

## Overview

`ensembleML` provides a **single, consistent API** for ensemble machine
learning in R.  Regardless of which algorithm you choose, the core workflow
is always:

```
em_fit()  ->  em_predict()  ->  em_evaluate()
```

Advanced usage adds:

```
em_cv()        # k-fold cross-validation (stability estimates)
em_tune()      # grid-search hyperparameter optimisation
em_compare()   # side-by-side algorithm comparison
em_importance() # feature importance
em_partial()   # partial dependence plots
em_confusion() # confusion matrix heatmap
em_calibration() # calibration / reliability diagram
em_residuals() # regression diagnostics
```

---

## 1. Train a model

```{r fit}
data(iris)
set.seed(42)
idx   <- sample(nrow(iris), 120)
train <- iris[idx,  ]
test  <- iris[-idx, ]

rf <- em_fit(Species ~ ., data = train, method = "random_forest",
             verbose = TRUE)
```

Switching algorithms requires changing a single argument:

```{r xgb, eval = FALSE}
xgb <- em_fit(Species ~ ., data = train, method = "xgboost")
ada <- em_fit(Species ~ ., data = train, method = "adaboost")
bag <- em_fit(Species ~ ., data = train, method = "bagging")
```

---

## 2. Predict

```{r predict}
preds <- em_predict(rf, newdata = test)
head(preds)
```

Class probabilities:

```{r prob}
probs <- em_predict(rf, newdata = test, type = "prob")
head(probs, 3)
```

---

## 3. Evaluate

```{r evaluate}
em_evaluate(rf, newdata = test)
```

Select specific metrics:

```{r metrics}
em_evaluate(rf, newdata = test, metrics = c("accuracy", "f1", "kappa"))
```

---

## 4. Cross-validation

Use `em_cv()` to get mean +/- SD across folds before committing to a method:

```{r cv, eval = FALSE}
cv_res <- em_cv(Species ~ ., data = iris, method = "random_forest",
                cv_folds = 5, repeats = 3)
cv_res$summary
em_plot_cv(cv_res, metric = "accuracy")
```

---

## 5. Tune hyperparameters

```{r tune, eval = FALSE}
grid <- list(ntree = c(100, 300, 500), mtry = c(1, 2, 3))

tuned <- em_tune(
  Species ~ ., data = train, method = "random_forest",
  param_grid = grid, cv_folds = 5
)

tuned$best_params
tuned$best_score
tuned$results
```

---

## 6. Compare algorithms

```{r compare, eval = FALSE}
cmp <- em_compare(Species ~ ., train = train, test = test)
cmp$table
```

---

## 7. Feature importance

```{r importance}
em_importance(rf, top_n = 4)
```

---

## 8. Partial dependence

```{r partial, eval = FALSE}
em_partial(rf, data = train, feature = "Petal.Length")
```

---

## 9. Confusion matrix

```{r confusion, eval = FALSE}
em_confusion(rf, newdata = test)
em_confusion(rf, newdata = test, normalise = TRUE)
```

---

## 10. Regression example

Everything works identically for numeric responses:

```{r regression}
set.seed(7)
reg_data  <- data.frame(
  x1 = rnorm(200), x2 = rnorm(200),
  y  = 3 + 2 * rnorm(200) + rnorm(200))
reg_train <- reg_data[1:160, ]
reg_test  <- reg_data[161:200, ]

reg_model <- em_fit(y ~ ., data = reg_train, method = "random_forest")
em_evaluate(reg_model, reg_test)
em_residuals(reg_model, reg_test)
```

---

## Citation

If you use `ensembleML` in published work, please cite it:

```{r citation, eval = FALSE}
citation("ensembleML")
```

The individual algorithms should also be cited — see `citation("ensembleML")`
for the full list of references.

---

## Session info

```{r session}
sessionInfo()
```