Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 18


Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)

Show Answer
Correct Answer: ABCDE

Supervised learning can be applied to determine which transactions are most likely to be fraudulent by training a model on historical data with known fraud labels. Clustering is an unsupervised learning technique that can be used to divide transactions into categories based on feature similarity, helping to identify unusual patterns or groups. Supervised learning can also be used to predict the location of a transaction by training a model on data with location labels to understand and predict future transactions' locations. Reinforcement learning and unsupervised learning for predicting the location are not suitable in this context as reinforcement learning involves learning from interaction with an environment, which is not applicable here, and unsupervised learning does not predict specific values like transaction location without labeled data for training.

Discussion

17 comments
Sign in to comment
jvg637Options: BCD
Mar 15, 2020

BCD makes more sense to me. Its for sure not unsupervised, since locations are in the data already. Reinforcement also doesn't fit, as there no AI and no interactions with data from the observer.

sergio6
Aug 30, 2021

D make sense, but i have a doubt: location is a discrete value (no regression), so a multiclass classification model should be applied ... to predict locations?

hellofrnds
Oct 9, 2021

yes. multiclass classification model should be applied

[Removed]Options: BCD
Mar 27, 2020

Answer: B, C, D Description: Fraud is not a feature, so unsupervised, location is given so supervised, Clustering can be done looking at the done with same features

musumusuOptions: BCD
Feb 23, 2023

Anwer: BCD Things to understand: Supervised learning will only predict the column that is labeled. In this case, there is not Fraud or not Fraud column inside which he will train on. So Option A, wrong. option D: Supervised learning for column (transaction location) is possible as column exist to train on. Option C: Custering N-type is possible and also an unsupervised learning to make cluster of similar pattern. Option B: Its a weaker point here, User should be able to know which clusters are fraud in history. As it doesn't give enough information about past analysis whether user knows potential frauds or not. Ignore this option, if question asked for 2 right options only.

TVH_Data_EngineerOptions: ACD
Nov 23, 2023

Options B, E, and F are not as suitable for the given scenario: B. Unsupervised learning to determine which transactions are most likely to be fraudulent. Unsupervised learning, while useful for anomaly detection, might not be as effective for fraud detection without labeled data indicating which transactions are fraudulent. E. Reinforcement learning to predict the location of a transaction. Reinforcement learning is more suitable for scenarios where an agent learns to make decisions through trial and error, which doesn't seem to align with predicting transaction locations. F. Unsupervised learning to predict the location of a transaction. Unsupervised learning typically doesn't involve predicting specific values (like location) without labeled data for training. In summary, A, C, and D are the most appropriate machine learning applications for investigating the provided bank transactions dataset.

Jarek7Options: BCE
May 2, 2023

I'd go for BCE instead of BCD, assuming that location is georgraphical location or the geographical location can be found from location using some side input. With so limited features (there is no even transaction date/time given!) and so huge and variant label as location it is impossible to get any convergence in supervised learning(D). Reinforced lerning(E) with reinforcment inversely proportional to the distance squared between predicted and the real location could get some reasonable results.

WaqasghalooOptions: ABC
Sep 13, 2023

Location is already given as attribite so what value is served with predicting location?

betterForGoOptions: ABC
Mar 8, 2023

I would like to choose ABC.

bha11111Options: BCD
Mar 11, 2023

BCD is correct

juliobsOptions: BCD
Mar 17, 2023

BCD. E does not make sense.

budgierOptions: ABC
May 2, 2023

According to GPT A,B,C

FP77
Aug 25, 2023

Well, GPT is stupid then

hpvbOptions: BCD
Jun 24, 2023

should be BCD. E doesn't make sense because re-enforcement learning is used only when you want to reach a optimal solution to a problem. Like optimized solution for reaching point A to point B and etc. You don't need re-enforcement learning to predict a location.

Mark_86Options: BCD
Jul 26, 2023

BCD make sense and does not require anything that is not given in the question data.

Dip1994Options: BCD
Aug 4, 2023

makes more sense

xiaofeng_0226Options: BCD
Aug 7, 2023

Absolutely

youare87Options: ACD
Aug 11, 2023

A, B: Data features without the definition of fraudulent, so we can not obtain the answer even if using the unsupervise learning. C: Kmeans solve this. D: logistic regression. Just put the location into target. E: Give the positive reward when the model predicts correct location. F: Same as C. Use all features but locations, and use similarity to predict new data.

rocky48Options: BCD
Nov 5, 2023

Answer: BCD

RoulleOptions: ACD
Jul 9, 2024

C and D are good for sureCliquez pour utiliser cette solution et E, F wrong for sure. Then, to choose between A and B. Both options indicate that we know which transactions are fraudulent and which are not. Indeed, in order to use unsupervised classification to determine the characteristics of fraudulent transactions, we must already know which ones are fraudulent, either because all transactions in the dataset are fraudulent, or because a variable allows us to identify them. If all transactions were fraudulent, this would probably have been specified in the statement. It is therefore more likely that the "type of transaction" variable can be used to distinguish fraudulent transactions from others. In this case, we have a target variable to predict, enabling us to build interpretable supervised models to understand the typology of fraudulent transactions. I therefore opt for A, C and D