Professional Machine Learning Engineer Exam - Question 60

Question

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

Examice · Accepted Answer

Since only 1% of the transactions are fraudulent, the dataset is highly imbalanced. Oversampling the minority class, in this case, fraudulent transactions, would increase their representation in the training dataset, helping the classifier to better learn to identify fraud. Writing data in TFRecords, Z-normalizing features, or using one-hot encoding on categorical features would not directly address the class imbalance issue affecting the performance of the model in detecting fraud.

ralf_cc · Answer

C - https://swarit.medium.com/detecting-fraudulent-consumer-transactions-through-machine-learning-25b1f2cabbb4

NamitSehgal · Answer

C is the answer

MultiCloudIronMan · Answer

Oversampling increases the number of fraudulent transaction in the training data to enable the machine to learn how to predict them

M25 · Answer

Went with C

Mohamed_Mossad · Answer

the best option is C

hiromi · Answer

C
https://medium.com/analytics-vidhya/credit-card-fraud-detection-how-to-handle-imbalanced-dataset-1f18b6f881

wish0035 · Answer

ans: C

A, B, D => wouldnt help with imbalance

fragkris · Answer

C - Even though most similar questions propose to downsample the majority (not fraudulent) and add weights to it.

PhilipKoku · Answer

C) Oversample

dija123 · Answer

Agree with C

Professional Machine Learning Engineer Exam - Question 60

Discussion