A simulated data set containing information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt.
default
balance
Fitting a logistic regression model with default
as the response and balance
as the predictor:
default
: “Yes” is Positivedefault_test_wprobs <- default_test |>
mutate(
knn_probs = predict(knnfit, new_data = default_test, type = "prob") |> pull(.pred_Yes),
logistic_probs = predict(logregfit, new_data = default_test, type = "prob") |> pull(.pred_Yes)
)
default_test_wprobs |> head() |> kable() # obtain probability predictions
default | student | balance | income | knn_probs | logistic_probs |
---|---|---|---|---|---|
No | No | 729.5265 | 44361.63 | 0 | 0.0012842 |
No | Yes | 808.6675 | 17600.45 | 0 | 0.0019883 |
No | Yes | 1220.5838 | 13268.56 | 0 | 0.0190870 |
No | No | 237.0451 | 28251.70 | 0 | 0.0000843 |
No | No | 606.7423 | 44994.56 | 0 | 0.0006514 |
No | No | 286.2326 | 45042.41 | 0 | 0.0001107 |
threshold <- 0.5 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No")),
logistic_preds = as_factor(if_else(logistic_probs > threshold, "Yes", "No"))
)
default_test_wprobs |> head() |> kable()
default | student | balance | income | knn_probs | logistic_probs | knn_preds | logistic_preds |
---|---|---|---|---|---|---|---|
No | No | 729.5265 | 44361.63 | 0 | 0.0012842 | No | No |
No | Yes | 808.6675 | 17600.45 | 0 | 0.0019883 | No | No |
No | Yes | 1220.5838 | 13268.56 | 0 | 0.0190870 | No | No |
No | No | 237.0451 | 28251.70 | 0 | 0.0000843 | No | No |
No | No | 606.7423 | 44994.56 | 0 | 0.0006514 | No | No |
No | No | 286.2326 | 45042.41 | 0 | 0.0001107 | No | No |
threshold <- 0.5 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No")),
logistic_preds = as_factor(if_else(logistic_probs > threshold, "Yes", "No")))
default_test_wprobs |> head() |> kable()
default | student | balance | income | knn_probs | logistic_probs | knn_preds | logistic_preds |
---|---|---|---|---|---|---|---|
No | No | 729.5265 | 44361.63 | 0 | 0.0012842 | No | No |
No | Yes | 808.6675 | 17600.45 | 0 | 0.0019883 | No | No |
No | Yes | 1220.5838 | 13268.56 | 0 | 0.0190870 | No | No |
No | No | 237.0451 | 28251.70 | 0 | 0.0000843 | No | No |
No | No | 606.7423 | 44994.56 | 0 | 0.0006514 | No | No |
No | No | 286.2326 | 45042.41 | 0 | 0.0001107 | No | No |
threshold <- 0.1 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No")))
roc_metrics(default_test_wprobs, truth = default, estimate = knn_preds, event_level = "second") |> kable()
.metric | .estimator | .estimate |
---|---|---|
accuracy | binary | 0.9060000 |
sensitivity | binary | 0.7364341 |
specificity | binary | 0.9116507 |
threshold <- 0.9 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No"))
)
roc_metrics(default_test_wprobs, truth = default, estimate = knn_preds, event_level = "second") |> kable()
.metric | .estimator | .estimate |
---|---|---|
accuracy | binary | 0.9685000 |
sensitivity | binary | 0.0310078 |
specificity | binary | 0.9997417 |
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{100}{100} = 1\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{100}{100} = 1\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{100}{100} = 1\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{55}{100} = 0.55\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{100}{100} = 1\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{0}{100} = 0\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{56}{100} = 0.56\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{0}{100} = 0\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{0}{100} = 0\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{0}{100} = 0\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{100}{100} = 1\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{100}{100} = 1\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{100}{100} = 1\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{62}{100} = 0.62\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{99}{100} = 0.99\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{58}{100} = 0.58\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{75}{100} = 0.75\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{29}{100} = 0.29\]
\[\text{Sensitivity} = \frac{TP}{TP + FN} = \frac{\text{Greens Above Line}}{\text{All Greens}} = \frac{37}{100} = 0.37\] \[1-\text{Specificity} = 1-\frac{TN}{TN + FP} = \frac{FP}{TN + FP} = \frac{\text{Reds Above Line}}{\text{All Reds}} = \frac{0}{100} = 0\]
Using the app, try and answer the following questions: