After installing scorecard via instructions in the README section, load the package into your environment.

`library(scorecard)`

Let’s use the *germancredit* dataset for the purposes of this
demonstration.

```
data("germancredit")
str(germancredit)
```

The `var_filter`

function drops column variables that
don’t meet the thresholds for missing rate (> 95% by default),
information value (IV) (< 0.02 by default), or identical value rate
(> 95% by default).

`<- var_filter(germancredit, y = "creditability") dt_f `

When building scorecard models, a subset of the observations should
be held out from the data used to train the model (similar to most other
traditional modeling approaches), and instead be apportioned to the
*test* set. We can perform this sampling to create the
*train* and *test* datasets using the
`split_df`

function.

```
<- split_df(dt_f, y = "creditability", ratios = c(0.6, 0.4), seed = 30)
dt_list <- lapply(dt_list, function(x) x$creditability) label_list
```

Weight-of-Evidence binning is a technique for binning both continuous
and categorical independent variables in a way that provides the most
robust bifurcation of the data against the dependent variable. This
technique can be easily executed across all independent variables using
the `woebin`

function.

```
<- woebin(dt_f, y = "creditability")
bins # woebin_plot(bins)
```

The user can also adjust bin breaks interactively by using the
`woebin_adj`

function.

`# breaks_adj <- woebin_adj(dt_f, y = "creditability", bins = bins)`

Furthermore, the user can set the bin breaks manually via the
`breaks_list = list()`

argument in the `woebin`

function. Note the use of *%,%* as a separator to create a single
bin from two classes in a categorical independent variable.

```
<- list(
breaks_adj age.in.years = c(26, 35, 40),
other.debtors.or.guarantors = c("none", "co-applicant%,%guarantor")
)
<- woebin(dt_f, y = "creditability", breaks_list = breaks_adj) bins_adj
```

Once your WoE bins are established for all desired independent variables, apply the binning logic to the training and test datasets.

`<- lapply(dt_list, function(x) woebin_ply(x, bins_adj)) dt_woe_list `

Logistic regression can often be leveraged effectively to assist in building the scorecards.

```
<- glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)
m1
# vif(m1, merge_coef = TRUE) # summary(m1)
# Select a formula-based model by AIC (or by LASSO for large dataset)
<- step(m1, direction = "both", trace = FALSE)
m_step <- eval(m_step$call)
m2
# vif(m2, merge_coef = TRUE) # summary(m2)
```

If oversampling is a concern, the following code chunk could be uncommented and run to help adjust for this issue.

```
# Read documentation on handling oversampling (support.sas.com/kb/22/601.html)
# library(data.table)
# p1 <- 0.03 # bad probability in population
# r1 <- 0.3 # bad probability in sample dataset
# dt_woe <- copy(dt_woe_list$train)[, weight := ifelse(creditability == 1, p1/r1, (1-p1)/(1-r1) )][]
# fmla <- as.formula(paste("creditability ~", paste(names(coef(m2))[-1], collapse = "+")))
# m3 <- glm(fmla, family = binomial(), data = dt_woe, weights = weight)
```

The `perf_eva`

function provides model accuracy statistics
(such as mse, rmse, logloss, r2, ks, auc, gini) and plots (such as ks,
lift, gain, roc, lz, pr, f1, density).

```
# First, get probabalistic predictions
<- lapply(dt_woe_list, function(x) predict(m2, x, type = 'response'))
pred_list # Then evaluate model accuracy
<- perf_eva(pred = pred_list, label = label_list) perf
```

Once the model has been selected, scorecards can be created via the
`scorecard`

function. Note that the default target points is
600, target odds is 1/19 and points to double the odds is 50. See
`?scorecard`

for more information on the function and its
arguments.

The scorecard can then be applied to the original data using the
`scorecard_ply`

function. Lastly, a chart encompassing
Population Stability Index (PSI) statistics can be rendered via the
`perf_psi`

function.

```
# Build the card
<- scorecard(bins_adj, m2)
card # Obtain Credit Scores
<- lapply(dt_list, function(x) scorecard_ply(x, card))
score_list # Analyze the PSI
perf_psi(score = score_list, label = label_list)
```