How to use breakDown package for models created with caret

Przemyslaw Biecek

2024-03-11

This example demonstrates how to use the breakDown package for models created with the caret package.

First we will generate some data.

library(caret)

set.seed(2)
training <- twoClassSim(50, linearVars = 2)
trainX <- training[, -ncol(training)]
trainY <- training$Class

head(training)
#>   TwoFactor1 TwoFactor2    Linear1    Linear2 Nonlinear1 Nonlinear2 Nonlinear3
#> 1 -0.6561702 -1.6480450  1.0744594  0.9758906  0.2342843  0.6805653  0.6920055
#> 2 -0.9849973  1.4598834  0.2605978 -0.1694232  0.1381283  0.7460168  0.5599569
#> 3  2.3722541  1.7069944 -0.3142720  0.7221918 -0.6920591  0.4642024  0.3426912
#> 4 -2.2067173 -0.6972704 -0.7496301 -0.8444186 -0.9303336  0.1374181  0.2344975
#> 5  0.5166671 -0.7228376 -0.8621983  1.2772937  0.9959069  0.8143796  0.4296028
#> 6  1.3331262 -0.9929323  2.0480403 -1.3431105  0.6711474  0.8321613  0.7367007
#>    Class
#> 1 Class1
#> 2 Class2
#> 3 Class1
#> 4 Class2
#> 5 Class1
#> 6 Class1

Now we are ready to train a model. Let’s train a glm model with caret.

cctrl1 <- trainControl(method = "cv", number = 3, returnResamp = "all",
                       classProbs = TRUE, 
                       summaryFunction = twoClassSummary)

test_class_cv_model <- train(trainX, trainY, 
                             method = "glm", 
                             trControl = cctrl1,
                             metric = "ROC", 
                             preProc = c("center", "scale"))
test_class_cv_model
#> Generalized Linear Model 
#> 
#> 50 samples
#>  7 predictor
#>  2 classes: 'Class1', 'Class2' 
#> 
#> Pre-processing: centered (7), scaled (7) 
#> Resampling: Cross-Validated (3 fold) 
#> Summary of sample sizes: 33, 34, 33 
#> Resampling results:
#> 
#>   ROC        Sens       Spec     
#>   0.7771991  0.7175926  0.8009259

To use breakDown we need a function that will calculate scores/predictions for a single observation. By default the predict() function returns predicted class.

So we are adding type = "prob" argument to get scores. And since there will be two scores for each observarion we need to extract one of them.

predict.fun <- function(model, x) predict(model, x, type = "prob")[,1]
testing <- twoClassSim(10, linearVars = 2)
predict.fun(test_class_cv_model, testing[1,])
#> [1] 0.9807632

Now we are ready to call the broken() function.

library("breakDown")
explain_2 <- broken(test_class_cv_model, testing[1,], data = trainX, predict.function = predict.fun)
explain_2
#>                                   contribution
#> (Intercept)                              0.500
#> + TwoFactor2 = -2.15297519239414         0.330
#> + Linear2 = 1.21347759171666             0.103
#> + Nonlinear2 = 0.938861106755212         0.037
#> + Nonlinear3 = 0.198311409447342         0.016
#> + Linear1 = -1.59104698624311            0.006
#> + Nonlinear1 = -0.693807001691312       -0.001
#> + TwoFactor1 = -1.5957842151878         -0.009
#> final_prognosis                          0.981
#> baseline:  0

And plot it.

library(ggplot2)
plot(explain_2) + ggtitle("breakDown plot for caret/glm model")