A simulated data set containing information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt.
default
balance
Fitting a logistic regression model with default
as the response and balance
as the predictor:
default
: “Yes” is PositiveActual Positive/Event | Actual Negative/Non-event | |
---|---|---|
Predicted Positive/Event | True Positive (TP) | False Positive (FP) |
Predicted Negative/Non-event | False Negative (FN) | True Negative (TN) |
default_test_wpreds <- default_test |>
mutate(
knn_preds = predict(knnfit, new_data = default_test, type = "class")$.pred_class,
logistic_preds = predict(logregfit, new_data = default_test, type = "class")$.pred_class
)
default_test_wpreds |> head() |> kable()
default | student | balance | income | knn_preds | logistic_preds |
---|---|---|---|---|---|
No | No | 729.5265 | 44361.63 | No | No |
No | Yes | 808.6675 | 17600.45 | No | No |
No | Yes | 1220.5838 | 13268.56 | No | No |
No | No | 237.0451 | 28251.70 | No | No |
No | No | 606.7423 | 44994.56 | No | No |
No | No | 286.2326 | 45042.41 | No | No |
yardstick
yardstick
is a package that ships with tidymodels
meant for model evaluationmetricname(data, truth, estimate, ...)
truth
and predicted values in for estimate
default_test_wpreds |>
binary_metrics(truth = default, estimate = knn_preds, event_level = "second") |>
kable()
.metric | .estimator | .estimate |
---|---|---|
accuracy | binary | 0.9757500 |
recall | binary | 0.3798450 |
precision | binary | 0.7424242 |
specificity | binary | 0.9956084 |
npv | binary | 0.9796645 |
mcc | binary | 0.5206828 |
f_meas | binary | 0.5025641 |
default_test_wpreds |>
binary_metrics(truth = default, estimate = logistic_preds, event_level = "second") |>
kable()
.metric | .estimator | .estimate |
---|---|---|
accuracy | binary | 0.9730000 |
recall | binary | 0.3023256 |
precision | binary | 0.6842105 |
specificity | binary | 0.9953500 |
npv | binary | 0.9771747 |
mcc | binary | 0.4437097 |
f_meas | binary | 0.4193548 |