--- title: "Introduction to PIE -- A Partially Interpretable Model with Black-box Refinement" author: "Tong Wang, Jingyi Yang, Yunyi Li and Boxiang Wang" date: "2025-01-20" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to PIE -- A Partially Interpretable Model with Black-box Refinement} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction to `PIE` The `PIE` package implements Partially Interpretable Estimators (PIE), a framework that jointly train an interpretable model and a black-box model to achieve high predictive performance as well as partial model transparency. ## Installation To install the development version from GitHub, run the following: ```r # Install the R package from CRAN install.packages("PIE") ``` ## Getting Started This section demonstrates how to generate synthetic data for transfer learning and apply the ART framework using different models. ### Generate Data The function `data_process()` allows you to process dataset into the format that fits with PIE model, including cross-validation dataset (such as training, validation and testing) and group indicators for group lasso. ```r library(PIE) # Load the training data data("winequality") # Which columns are numerical? num_col <- 1:11 # Which columns are categorical? cat_col <- 12 # Which column is the response? y_col <- ncol(winequality) # Data Processing dat <- data_process(X = as.matrix(winequality[, -y_col]), y = winequality[, y_col], num_col = num_col, cat_col = cat_col, y_col = y_col) ``` ## Fitting PIE Once the data is prepared, you can use the `PIE_fit()` function to train PIE model. In this example, we fit only with 5 iterations using group lasso and XGBoost models. ```r # Fit a PIE model fold <- 1 fit <- PIE_fit( X = dat$spl_train_X[[fold]], y = dat$train_y[[fold]], lasso_group = dat$lasso_group, X_orig = dat$orig_train_X[[fold]], lambda1 = 0.01, lambda2 = 0.01, iter = 5, eta = 0.05, nrounds = 200 ) ``` ## Predicting PIE Once your PIE model is trained, you can use the `PIE_predict()` function to predict on test data. ```r # Prediction pred <- predict(fit, X = dat$spl_validation_X[[fold]], X_orig = dat$orig_validation_X[[fold]]) ``` ## Evaluate PIE You can evaluate your PIE model's performance with `RPE()`, which has formula $RPE=\frac{\sum_i(y_i-\hat{y_i})^2}{\sum_i(y_i-\bar{y})^2}$, where $\bar{y} = \frac{1}{n}\sum_i^n y_i$. ```r # Validation val_rrmse_test <- RPE(pred$total, dat$validation_y[[fold]]) ```