Professional Data Engineer Exam - Question 18

Question

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three. ).

Examice · Accepted Answer

Supervised learning can be applied to determine which transactions are most likely to be fraudulent by training a model on historical data with known fraud labels. Clustering is an unsupervised learning technique that can be used to divide transactions into categories based on feature similarity, helping to identify unusual patterns or groups. Supervised learning can also be used to predict the location of a transaction by training a model on data with location labels to understand and predict future transactions' locations. Reinforcement learning and unsupervised learning for predicting the location are not suitable in this context as reinforcement learning involves learning from interaction with an environment, which is not applicable here, and unsupervised learning does not predict specific values like transaction location without labeled data for training.

jvg637 · Answer

BCD makes more sense to me. Its for sure not unsupervised, since locations are in the data already. Reinforcement also doesn't fit, as there no AI and no interactions with data from the observer.

[Removed] · Answer

Answer: B, C, D
Description: Fraud is not a feature, so unsupervised, location is given so supervised, Clustering can be done looking at the done with same features

musumusu · Answer

Anwer: BCD
Things to understand:
Supervised learning will only predict the column that is labeled. In this case, there is not Fraud or not Fraud column inside which he will train on. So Option A, wrong. 
option D: Supervised learning for column (transaction location) is possible as column exist to train on. 
Option C: Custering N-type is possible and also an unsupervised learning to make cluster of similar pattern. 
Option B: Its a weaker point here, User should be able to know which clusters are fraud in history. As it doesn't give enough information about past analysis whether user knows potential frauds or not. Ignore this option, if question asked for 2 right options only.

TVH_Data_Engineer · Answer

Options B, E, and F are not as suitable for the given scenario:

B. Unsupervised learning to determine which transactions are most likely to be fraudulent.

Unsupervised learning, while useful for anomaly detection, might not be as effective for fraud detection without labeled data indicating which transactions are fraudulent.
E. Reinforcement learning to predict the location of a transaction.

Reinforcement learning is more suitable for scenarios where an agent learns to make decisions through trial and error, which doesn't seem to align with predicting transaction locations.
F. Unsupervised learning to predict the location of a transaction.

Unsupervised learning typically doesn't involve predicting specific values (like location) without labeled data for training.
In summary, A, C, and D are the most appropriate machine learning applications for investigating the provided bank transactions dataset.

Jarek7 · Answer

I'd go for BCE instead of BCD, assuming that location is georgraphical location or the geographical location can be found from location using some side input. 
With so limited features (there is no even transaction date/time given!) and so huge and variant label as location it is impossible to get any convergence in supervised learning(D).
Reinforced lerning(E) with reinforcment inversely proportional to the distance squared between predicted and the real location could get some reasonable results.

Waqasghaloo · Answer

Location is already given as attribite so what value is served with predicting location?

betterForGo · Answer

I would like to choose ABC.

bha11111 · Answer

BCD is correct

juliobs · Answer

BCD. E does not make sense.

budgier · Answer

According to GPT   A,B,C

hpvb · Answer

should be BCD.
E doesn't make sense because re-enforcement learning is used only when you want to reach a optimal solution to a problem. Like optimized solution for reaching point A to point B and etc. You don't need re-enforcement learning to predict a location.

Mark_86 · Answer

BCD make sense and does not require anything that is not given in the question data.

Dip1994 · Answer

makes more sense

xiaofeng_0226 · Answer

Absolutely

youare87 · Answer

A, B: Data features without the definition of fraudulent, so we can not obtain the answer even if using the unsupervise learning.
C: Kmeans solve this.
D: logistic regression. Just put the location into target.
E: Give the positive reward when the model predicts correct location.
F: Same as C. Use all features but locations, and use similarity to predict new data.

rocky48 · Answer

Answer: BCD

Roulle · Answer

C and D are good for sureCliquez pour utiliser cette solution et E, F wrong for sure.
 
Then, to choose between A and B. Both options indicate that we know which transactions are fraudulent and which are not. Indeed, in order to use unsupervised classification to determine the characteristics of fraudulent transactions, we must already know which ones are fraudulent, either because all transactions in the dataset are fraudulent, or because a variable allows us to identify them. If all transactions were fraudulent, this would probably have been specified in the statement. It is therefore more likely that the "type of transaction" variable can be used to distinguish fraudulent transactions from others.

In this case, we have a target variable to predict, enabling us to build interpretable supervised models to understand the typology of fraudulent transactions. I therefore opt for A, C and D

Professional Data Engineer Exam - Question 18

Discussion