Professional Machine Learning Engineer Exam - Question 56

Question

You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when training the model?

Examice · Accepted Answer

In credit card fraud detection, the data is typically imbalanced with significantly fewer fraudulent transactions compared to legitimate ones. The goal is to detect as many fraudulent transactions as possible (high recall) while keeping the number of legitimate transactions incorrectly flagged as fraud (false positives) to a minimum (high precision). The optimization objective that maximizes the area under the precision-recall curve (AUC PR) is the most suitable for this scenario. AUC PR focuses on the balance between precision and recall, providing a measure that better handles the class imbalance and the differing costs of false positives and false negatives. Therefore, choosing an objective that maximizes AUC PR ensures a more effective fraud detection model in imbalanced data situations like this.

Paul_Dirac · Answer

This is a case of imbalanced data.
Ans: C 
https://stats.stackexchange.com/questions/262616/roc-vs-precision-recall-curves-on-imbalanced-dataset

https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc

ralf_cc · Answer

D - https://en.wikipedia.org/wiki/Receiver_operating_characteristic

giaZ · Answer

https://icaiit.org/proceedings/6th_ICAIIT/1_3Fayzrakhmanov.pdf
The problem of fraudulent transactions detection, which is an imbalanced classification problem (most transactions are not fraudulent), you want to maximize both precision and recall; so the area under the PR curve. As a matter of fact, the question asks you to focus on detecting fraudulent transactions (maximize true positive rate, a.k.a. Recall) while minimizing false positives (a.k.a. maximizing Precision). Another way to see it is this: for imbalanced problems like this one you'll get a lot of true negatives even from a bad model (it's easy to guess a transaction as "non-fraudulent" because most of them are!), and with high TN the ROC curve goes high fast, which would be misleading. So you wanna avoid dealing with true negatives in your evaluation, which is precisely what the PR curve allows you to do.

tavva_prudhvi · Answer

In fraud detection, it's crucial to minimize false positives (transactions flagged as fraudulent but are actually legitimate) while still detecting as many fraudulent transactions as possible. AUC PR is a suitable optimization objective for this scenario because it provides a balanced trade-off between precision and recall, which are both important metrics in fraud detection. A high AUC PR value indicates that the model has high precision and recall, which means it can detect a large number of fraudulent transactions while minimizing false positives.

Log loss (A) and AUC ROC (D) are also commonly used optimization objectives in machine learning, but they may not be as effective in this particular scenario. Precision at a Recall value of 0.50 (B) is a specific metric and not an optimization objective.

ramen_lover · Answer

The following is the official document for the list of optimization objectives for AutoML Tables
"About model optimization objectives"
https://cloud.google.com/automl-tables/docs/train#opt-obj

AUC PR: Optimize results for predictions for the less common class.

John_Pongthorn · Answer

Detection of fraudulent transactions seems to be imbalanced data.
https://cloud.google.com/automl-tables/docs/train#opt-obj
AUC ROC : Distinguish between classes. Default value for binary classification.

AUC PR	Optimize results for  predictions for the less common class.
it is straightforward to answer, you just have to capture key word to get the right way. (Almost banlanced Or Imbalanced)
https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/

When to Use ROC vs. Precision-Recall Curves?
Generally, the use of ROC curves and precision-recall curves are as follows:

ROC curves should be used when there are roughly equal numbers of observations for each class.
Precision-Recall curves should be used when there is a moderate to large class imbalance.

itallix · Answer

"You need to prioritize detection of fraudulent transactions while minimizing false positives."
Seems that answer B fits this well. If we want to focus exactly on minimizing false positives we can do that by maximising Precision at a specific Recall value. C is about balance between these two, and D doesn't care about false positive/negatives.

hiromi · Answer

C
https://towardsdatascience.com/on-roc-and-precision-recall-curves-c23e9b63820c

rtnk22 · Answer

Answer is c.

suresh_vn · Answer

D 
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
C optimize precision only

wish0035 · Answer

ans: C
Paul_Dirac and giaZ are correct.

ares81 · Answer

Fraud Detection --> Imbalanced Dataset ---> AUC PR --> C, for me

enghabeth · Answer

What is different however is that ROC AUC looks at a true positive rate TPR and false positive rate FPR while PR AUC looks at positive predictive value PPV and true positive rate TPR.

Detect Fraudulent transactions = Max TP
Minimizing false positives -> min FP

https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc#:~:text=ROC%20AUC%20vs%20PR%20AUC&text=What%20is%20different%20however%20is,and%20true%20positive%20rate%20TPR

John_Pongthorn · Answer

Hi Everyone
I discover, there are some clues that  this question is likely to refer to  the last section  of https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc  
This is what it tries to tell us especially with the last sentence
Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization.

Additionally, it tells me which of the following choices is the answer to this question as below.
https://cloud.google.com/automl-tables/docs/train#opt-obj.

M25 · Answer

Went with C

PhilipKoku · Answer

C) PR ROC

PhilipKoku · Answer

C) PR (Precision Recall)

Professional Machine Learning Engineer Exam - Question 56

Discussion