On March 5th at 10am in JAAC, Kyle Mayer will be guest lecturing to talk about his role as a data analyst at Micron. Kyle has a Bachelors in Engineering and a Masters in Physics.
Bio: Kyle Mayer is a Senior Data Analytics Engineer at Micron Technology with over 15 years of experience. He currently supports the Global Quality organization with engineering and data science projects. Kyle specializes in modeling complex physical processes, delivering insights that drive business value. Leveraging his expertise in semiconductor manufacturing, Kyle has developed tools and processes that have prevented over $100M in lost revenue. His passion for applying artificial intelligence to solve complex problems and develop automated systems highlights his commitment to innovation. In his spare time, Kyle enjoys spending time with his kids and going on hikes.
A simulated data set containing information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt.
default
balance
Fitting a logistic regression model with default
as the response and balance
as the predictor:
default
: “Yes” is Positivedefault_test_wprobs <- default_test |>
mutate(
knn_probs = predict(knnfit, new_data = default_test, type = "prob") |> pull(.pred_Yes),
logistic_probs = predict(logregfit, new_data = default_test, type = "prob") |> pull(.pred_Yes)
)
default_test_wprobs |> head() |> kable() # obtain probability predictions
default | student | balance | income | knn_probs | logistic_probs |
---|---|---|---|---|---|
No | No | 729.5265 | 44361.63 | 0 | 0.0012842 |
No | Yes | 808.6675 | 17600.45 | 0 | 0.0019883 |
No | Yes | 1220.5838 | 13268.56 | 0 | 0.0190870 |
No | No | 237.0451 | 28251.70 | 0 | 0.0000843 |
No | No | 606.7423 | 44994.56 | 0 | 0.0006514 |
No | No | 286.2326 | 45042.41 | 0 | 0.0001107 |
threshold <- 0.5 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No")),
logistic_preds = as_factor(if_else(logistic_probs > threshold, "Yes", "No"))
)
default_test_wprobs |> head() |> kable()
default | student | balance | income | knn_probs | logistic_probs | knn_preds | logistic_preds |
---|---|---|---|---|---|---|---|
No | No | 729.5265 | 44361.63 | 0 | 0.0012842 | No | No |
No | Yes | 808.6675 | 17600.45 | 0 | 0.0019883 | No | No |
No | Yes | 1220.5838 | 13268.56 | 0 | 0.0190870 | No | No |
No | No | 237.0451 | 28251.70 | 0 | 0.0000843 | No | No |
No | No | 606.7423 | 44994.56 | 0 | 0.0006514 | No | No |
No | No | 286.2326 | 45042.41 | 0 | 0.0001107 | No | No |
threshold <- 0.5 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No")),
logistic_preds = as_factor(if_else(logistic_probs > threshold, "Yes", "No")))
default_test_wprobs |> head() |> kable()
default | student | balance | income | knn_probs | logistic_probs | knn_preds | logistic_preds |
---|---|---|---|---|---|---|---|
No | No | 729.5265 | 44361.63 | 0 | 0.0012842 | No | No |
No | Yes | 808.6675 | 17600.45 | 0 | 0.0019883 | No | No |
No | Yes | 1220.5838 | 13268.56 | 0 | 0.0190870 | No | No |
No | No | 237.0451 | 28251.70 | 0 | 0.0000843 | No | No |
No | No | 606.7423 | 44994.56 | 0 | 0.0006514 | No | No |
No | No | 286.2326 | 45042.41 | 0 | 0.0001107 | No | No |
threshold <- 0.1 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No")))
roc_metrics(default_test_wprobs, truth = default, estimate = knn_preds, event_level = "second") |> kable()
.metric | .estimator | .estimate |
---|---|---|
accuracy | binary | 0.9060000 |
sensitivity | binary | 0.7364341 |
specificity | binary | 0.9116507 |
threshold <- 0.9 # set threshold
default_test_wprobs <- default_test_wprobs |>
mutate(knn_preds = as_factor(if_else(knn_probs > threshold, "Yes", "No"))
)
roc_metrics(default_test_wprobs, truth = default, estimate = knn_preds, event_level = "second") |> kable()
.metric | .estimator | .estimate |
---|---|---|
accuracy | binary | 0.9685000 |
sensitivity | binary | 0.0310078 |
specificity | binary | 0.9997417 |