MATH 427: Evaluating Classification Models

Eric Friedlander

Tips on Gradient Descent HW

Make step size small!
May take a while to converge
Try adaptive step size (i.e. backtracking)
Clarification on stopping criteria
- Set tolerance
- Stop when distance from gradient to \((0, 0)\) is below tolerance

Computational Set-Up

library(tidyverse)
library(tidymodels)
library(gridExtra)
library(janitor) # for next contingency tables
library(kableExtra)
library(ISLR2)

tidymodels_prefer()

Default Dataset

A simulated data set containing information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt.

head(Default) |> kable()  # print first six observations

default	student	balance	income
No	No	729.5265	44361.625
No	Yes	817.1804	12106.135
No	No	1073.5492	31767.139
No	No	529.2506	35704.494
No	No	785.6559	38463.496
No	Yes	919.5885	7491.559

Response Variable: default

Default |> 
  tabyl(default) |>  # class frequencies
  kable()           # Make it look nice

default	n	percent
No	9667	0.9667
Yes	333	0.0333

Split the data

set.seed(427)

default_split <- initial_split(Default, prop = 0.6, strata = default)
default_split

<Training/Testing/Total>
<6000/4000/10000>

default_train <- training(default_split)
default_test <- testing(default_split)

K-Nearest Neighbors Classifier: Build Model

Response (\(Y\)): default
Predictor (\(X\)): balance

knnfit <- nearest_neighbor(neighbors = 10) |> 
  set_engine("kknn") |> 
  set_mode("classification") |>  
  fit(default ~ balance, data = Default)   # fit 10-nn model

K-Nearest Neighbors Classifier: Predictions

Class labels
Probabilities

predict(knnfit, new_data = Default, type = "class") |> head() |> kable()   # obtain predictions as classes

.pred_class
No
No
No
No
No
No

Predicts class w/ maximum probability

predict(knnfit, new_data = Default, type = "prob") |> head() |> kable() # obtain predictions as probabilities

.pred_No	.pred_Yes
1	0
1	0
1	0
1	0
1	0
1	0

Fitting a logistic regression

Fitting a logistic regression model with default as the response and balance as the predictor:

logregfit <- logistic_reg() |> 
  set_engine("glm") |> 
  fit(default ~ balance, data = default_train)   # fit logistic regression model

tidy(logregfit) |> kable()  # obtain results

term	estimate	std.error	statistic	p.value
(Intercept)	-10.6926385	0.4659035	-22.95033	0
balance	0.0055327	0.0002841	19.47329	0

predict(logregfit, new_data = tibble(balance = 700), type = "class") |> kable()   # obtain class predictions

.pred_class
No

predict(logregfit, new_data = tibble(balance = 700), type = "raw") |> kable()   # obtain log-odds predictions

x
-6.819727

predict(logregfit, new_data = tibble(balance = 700), type = "prob") |> kable()  # obtain probability predictions

.pred_No	.pred_Yes
0.9989092	0.0010908

Assessing Performance of Classifiers

Binary Classifiers

Start with binary classification scenarios
With binary classification, designate one category as “Success/Positive” and the other as “Failure/Negative”
- If relevant to your problem: “Positive” should be the thing you’re trying to predict/care more about
- Note: “Positive” \(\neq\) “Good”
- For default: “Yes” is Positive
Some metrics weight “Positives” more and viceversa

Confusion Matrix

	Actual Positive/Event	Actual Negative/Non-event
Predicted Positive/Event	True Positive (TP)	False Positive (FP)
Predicted Negative/Non-event	False Negative (FN)	True Negative (TN)

Adding predictions to tibble

default_test_wpreds <- default_test |> 
  mutate(
    knn_preds = predict(knnfit, new_data = default_test, type = "class")$.pred_class,
    logistic_preds = predict(logregfit, new_data = default_test, type = "class")$.pred_class
  )

default_test_wpreds |> head() |> kable()

default	student	balance	income	knn_preds	logistic_preds
No	No	729.5265	44361.63	No	No
No	Yes	808.6675	17600.45	No	No
No	Yes	1220.5838	13268.56	No	No
No	No	237.0451	28251.70	No	No
No	No	606.7423	44994.56	No	No
No	No	286.2326	45042.41	No	No

KNN: Confusion Matrix

default_test_wpreds |>
  conf_mat(truth = default, estimate = knn_preds)

          Truth
Prediction   No  Yes
       No  3854   80
       Yes   17   49

KNN: Confusion Matrix (Sexy)

default_test_wpreds |>
  conf_mat(truth = default, estimate = knn_preds) |> 
  autoplot("heatmap")

Logistic Regression: Confusion Matrix

default_test_wpreds |>
  conf_mat(truth = default, estimate = logistic_preds) |> 
  autoplot(type = "heatmap")

Classification Metrics

Accuracy: proportion of your classes that are correct \[(TP + TN)/Total\]
Recall/Sensitivity: proportion of true positives correct (true positive rate) \[TP/(TP+FN)\]
Precision/Positive Predictive Value (PPV): proportion of predicted positive that are correct \[TP/(TP+FP)\]
Specificity: proportion of true negatives correct (true negative rate) \[TN/(TN+FP)\]
Negative Predictive Value (NPV): proportion of predicted negatives that are correct \[TN/(TN+FN)\]

KNN: Performance

default_test_wpreds |>
  conf_mat(truth = default, estimate = knn_preds) |> 
  autoplot("heatmap")

Accuracy: \((3854+49)/4000 = .976 = 97.6\%\)
Recall/Sensitivity: \(49/(49+80) = 0.380 = 38.0\%\)
Precision/Positive Predictive Value (PPV): \(49/(49+17) = .742 = 74.2\%\)
Specificity: \(3854/(3854+17) = 0.996 = 99.6\%\)
Negative Predictive Value (NPV): \(3854/(3854+80) = 98.0\)

Logistic Regression: Performance

default_test_wpreds |>
  conf_mat(truth = default, estimate = logistic_preds) |> 
  autoplot("heatmap")

Compute the following and write your answers on the board:

Accuracy
Recall/Sensitivity
Precision/Positive Predictive Value (PPV)
Specificity
Negative Predictive Value (NPV)

Performance Metrics with `yardstick`

yardstick is a package that ships with tidymodels meant for model evaluation
Typical syntax: metricname(data, truth, estimate, ...)
- Bind original data with predicted observations
- Put true response in for truth and predicted values in for estimate

Logistic Regression: Accuracy

default_test_wpreds |> 
  accuracy(truth = default, estimate = logistic_preds) |> 
  kable()

.metric	.estimator	.estimate
accuracy	binary	0.973

Two More Metrics

Matthews correlation coefficient (MCC): similar to \(R^2\) but for classification \[\frac{TP\times TN - FP \times FN}{\sqrt{(TP + FP)(TP+FN)(TN+FP)(TN+FN)}}\]
- Good for imbalanced data
- Considers both positives and negatives
F-Measure: harmonic mean of recall and precision \[\frac{2}{recall^{-1} + precision^{-1}} = \frac{2TP}{2TP+FP+FN}\]
- Focuses more on positives
- bad of imbalanced data

Metric Sets

binary_metrics <- metric_set(accuracy, recall, precision, specificity,
                             npv, mcc, f_meas)

Can apply this to compute a bunch of metrics

KNN: Performance

default_test_wpreds |> 
  binary_metrics(truth = default, estimate = knn_preds, event_level = "second") |> 
  kable()

.metric	.estimator	.estimate
accuracy	binary	0.9757500
recall	binary	0.3798450
precision	binary	0.7424242
specificity	binary	0.9956084
npv	binary	0.9796645
mcc	binary	0.5206828
f_meas	binary	0.5025641

Logistic Regression: Performance

default_test_wpreds |> 
  binary_metrics(truth = default, estimate = logistic_preds, event_level = "second") |> 
  kable()

.metric	.estimator	.estimate
accuracy	binary	0.9730000
recall	binary	0.3023256
precision	binary	0.6842105
specificity	binary	0.9953500
npv	binary	0.9771747
mcc	binary	0.4437097
f_meas	binary	0.4193548

Discussion

For each of the following metrics, brainstorm a situation in which that metric is probably the most important:
- Recall
- Precision
- Accuracy

MATH 427: Evaluating Classification Models

Tips on Gradient Descent HW

Computational Set-Up

Default Dataset

Split the data

K-Nearest Neighbors Classifier: Build Model

K-Nearest Neighbors Classifier: Predictions

Fitting a logistic regression

Making predictions in R

Assessing Performance of Classifiers

Binary Classifiers

Confusion Matrix

Adding predictions to tibble

KNN: Confusion Matrix

KNN: Confusion Matrix (Sexy)

Logistic Regression: Confusion Matrix

Classification Metrics

KNN: Performance

Logistic Regression: Performance

Performance Metrics with yardstick

Logistic Regression: Accuracy

Two More Metrics

Metric Sets

KNN: Performance

Logistic Regression: Performance

Discussion

Performance Metrics with `yardstick`