Professional Machine Learning Engineer Exam - Question 39

Question

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow

Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?

Examice · Accepted Answer

To automatically run a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE) as soon as new data is available, the workflow should use an event-driven architecture. Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Then, use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster. This approach ensures that the training job is initiated immediately when new data is available, avoiding the inefficiencies of polling (option B) or scheduling regular checks (option D), and eliminates the need to re-engineer the data pipeline (option A).

Paul_Dirac · Answer

C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow-pipelines

Paul_Dirac · Answer

C
https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow-pipelines

hiromi · Answer

C
Pubsub is the keyword

Mohamed_Mossad · Answer

event driven architecture is better than polling based architecure so I will vote for C

behzadsw · Answer

The question says: As part of your CI/CD workflow, you want to automatically run a Kubeflow..

C is also an option but it seems more cumbersome. 
One thing hat could be against A is that the data engineering team is separate team so they might not access your CI/CD if any changes from their side is needed..

Fatiy · Answer

The scenario involves automatically running a Kubeflow Pipelines training job on GKE as soon as new data becomes available. To achieve this, we can use Cloud Storage to store the cleaned dataset, and then configure a Cloud Storage trigger that sends a message to a Pub/Sub topic whenever a new file is added to the storage bucket. We can then create a Pub/Sub-triggered Cloud Function that starts the training job on a GKE cluster.

M25 · Answer

Went with C

Sum_Sum · Answer

C- because you don't want to re-engineer the pipeline

fragkris · Answer

C - This is the google reccomended method.

PhilipKoku · Answer

C) PUB/sub trigger from Cloud Storage & Cloud Function

Professional Machine Learning Engineer Exam - Question 39

Discussion