Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 56


You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when training the model?

Show Answer
Correct Answer: C

In credit card fraud detection, the data is typically imbalanced with significantly fewer fraudulent transactions compared to legitimate ones. The goal is to detect as many fraudulent transactions as possible (high recall) while keeping the number of legitimate transactions incorrectly flagged as fraud (false positives) to a minimum (high precision). The optimization objective that maximizes the area under the precision-recall curve (AUC PR) is the most suitable for this scenario. AUC PR focuses on the balance between precision and recall, providing a measure that better handles the class imbalance and the differing costs of false positives and false negatives. Therefore, choosing an objective that maximizes AUC PR ensures a more effective fraud detection model in imbalanced data situations like this.

Discussion

17 comments
Sign in to comment
Paul_DiracOption: C
Aug 1, 2021

This is a case of imbalanced data. Ans: C https://stats.stackexchange.com/questions/262616/roc-vs-precision-recall-curves-on-imbalanced-dataset https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc

GogoG
Oct 17, 2021

C is wrong - correct answer is D. ROC basically compares True Positives against False Negative, exactly what we are trying to optimise for.

ralf_ccOption: D
Jul 10, 2021

D - https://en.wikipedia.org/wiki/Receiver_operating_characteristic

omar_bh
Jul 16, 2021

True. The true positive is presented by Y axis. The bigger the area the graph take, the higher TP ratio

tavva_prudhvi
Jul 22, 2023

A larger area under the ROC curve does indicate a better model performance in terms of correctly identifying true positives. However, it does not take into account the imbalance in the class distribution or the costs associated with false positives and false negatives. In contrast, the AUC PR curve focuses on the trade-off between precision (Y-axis) and recall (X-axis), making it more suitable for imbalanced datasets and applications with different costs for false positives and false negatives, like credit card fraud detection.

tavva_prudhvi
Jul 22, 2023

AUC ROC is more suitable when the class distribution is balanced and false positives and false negatives have similar costs. In the case of credit card fraud detection, the class distribution is typically imbalanced (fewer fraudulent transactions compared to non-fraudulent ones), and the cost of false positives (incorrectly identifying a transaction as fraudulent) and false negatives (failing to detect a fraudulent transaction) are not the same. By maximizing the AUC PR (area under the precision-recall curve), the model focuses on the trade-off between precision (proportion of true positives among predicted positives) and recall (proportion of true positives among actual positives), which is more relevant in imbalanced datasets and for applications where the costs of false positives and false negatives are not equal. This makes option C a better choice for credit card fraud detection.

giaZOption: C
Mar 10, 2022

https://icaiit.org/proceedings/6th_ICAIIT/1_3Fayzrakhmanov.pdf The problem of fraudulent transactions detection, which is an imbalanced classification problem (most transactions are not fraudulent), you want to maximize both precision and recall; so the area under the PR curve. As a matter of fact, the question asks you to focus on detecting fraudulent transactions (maximize true positive rate, a.k.a. Recall) while minimizing false positives (a.k.a. maximizing Precision). Another way to see it is this: for imbalanced problems like this one you'll get a lot of true negatives even from a bad model (it's easy to guess a transaction as "non-fraudulent" because most of them are!), and with high TN the ROC curve goes high fast, which would be misleading. So you wanna avoid dealing with true negatives in your evaluation, which is precisely what the PR curve allows you to do.

tavva_prudhviOption: C
Jul 3, 2023

In fraud detection, it's crucial to minimize false positives (transactions flagged as fraudulent but are actually legitimate) while still detecting as many fraudulent transactions as possible. AUC PR is a suitable optimization objective for this scenario because it provides a balanced trade-off between precision and recall, which are both important metrics in fraud detection. A high AUC PR value indicates that the model has high precision and recall, which means it can detect a large number of fraudulent transactions while minimizing false positives. Log loss (A) and AUC ROC (D) are also commonly used optimization objectives in machine learning, but they may not be as effective in this particular scenario. Precision at a Recall value of 0.50 (B) is a specific metric and not an optimization objective.

ramen_loverOption: C
Dec 7, 2021

The following is the official document for the list of optimization objectives for AutoML Tables "About model optimization objectives" https://cloud.google.com/automl-tables/docs/train#opt-obj AUC PR: Optimize results for predictions for the less common class.

John_PongthornOption: C
Jan 24, 2023

Detection of fraudulent transactions seems to be imbalanced data. https://cloud.google.com/automl-tables/docs/train#opt-obj AUC ROC : Distinguish between classes. Default value for binary classification. AUC PR Optimize results for predictions for the less common class. it is straightforward to answer, you just have to capture key word to get the right way. (Almost banlanced Or Imbalanced) https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/ When to Use ROC vs. Precision-Recall Curves? Generally, the use of ROC curves and precision-recall curves are as follows: ROC curves should be used when there are roughly equal numbers of observations for each class. Precision-Recall curves should be used when there is a moderate to large class imbalance.

itallixOption: B
Sep 7, 2022

"You need to prioritize detection of fraudulent transactions while minimizing false positives." Seems that answer B fits this well. If we want to focus exactly on minimizing false positives we can do that by maximising Precision at a specific Recall value. C is about balance between these two, and D doesn't care about false positive/negatives.

hiromiOption: C
Dec 15, 2022

C https://towardsdatascience.com/on-roc-and-precision-recall-curves-c23e9b63820c

rtnk22Option: C
Aug 6, 2022

Answer is c.

suresh_vnOption: D
Aug 23, 2022

D https://en.wikipedia.org/wiki/Receiver_operating_characteristic C optimize precision only

suresh_vn
Aug 23, 2022

Sorry, C is my final decision https://cloud.google.com/automl-tables/docs/train#opt-obj

wish0035Option: C
Dec 16, 2022

ans: C Paul_Dirac and giaZ are correct.

ares81Option: C
Jan 5, 2023

Fraud Detection --> Imbalanced Dataset ---> AUC PR --> C, for me

enghabethOption: D
Feb 8, 2023

What is different however is that ROC AUC looks at a true positive rate TPR and false positive rate FPR while PR AUC looks at positive predictive value PPV and true positive rate TPR. Detect Fraudulent transactions = Max TP Minimizing false positives -> min FP https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc#:~:text=ROC%20AUC%20vs%20PR%20AUC&text=What%20is%20different%20however%20is,and%20true%20positive%20rate%20TPR

John_PongthornOption: C
Feb 16, 2023

Hi Everyone I discover, there are some clues that this question is likely to refer to the last section of https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc This is what it tries to tell us especially with the last sentence Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization. Additionally, it tells me which of the following choices is the answer to this question as below. https://cloud.google.com/automl-tables/docs/train#opt-obj.

M25Option: C
May 9, 2023

Went with C

PhilipKokuOption: C
Jun 6, 2024

C) PR ROC

PhilipKokuOption: C
Jun 6, 2024

C) PR (Precision Recall)