Professional Machine Learning Engineer Exam - Question 219

Question

Your company manages an ecommerce website. You developed an ML model that recommends additional products to users in near real time based on items currently in the user’s cart. The workflow will include the following processes:

1. The website will send a Pub/Sub message with the relevant data and then receive a message with the prediction from Pub/Sub

2. Predictions will be stored in BigQuery

3. The model will be stored in a Cloud Storage bucket and will be updated frequently

You want to minimize prediction latency and the effort required to update the model. How should you reconfigure the architecture?

Examice · Accepted Answer

To minimize prediction latency and reduce the effort required to update the model, using RunInference API with WatchFilePattern in a Dataflow job is the optimal solution. The RunInference API allows for efficient stream processing with low latency, while WatchFilePattern enables automatic updates to the model stored in Cloud Storage without needing to redeploy or manually manage endpoint updates, ensuring seamless model refreshes.

guilhermebutzke · Answer

My answer: D

This Google Documentation explains “Instead of deploying the model to an endpoint, you can use the RunInference API to serve machine learning models in your Apache Beam pipeline. This approach has several advantages, including flexibility and portability.”
https://cloud.google.com/blog/products/ai-machine-learning/streaming-prediction-with-dataflow-and-vertex

This documentation uses RunInference and WatchFilePattern to “to automatically update the ML model without stopping the Apache Beam”.
https://cloud.google.com/dataflow/docs/notebooks/automatic_model_refresh

So, thinking in “minimize prediction latency”, its suggested use RunInfenrece, while “effort required to update the model” the **WatchFilePattern is the best approach.**  I think D is the best option

ddogg · Answer

Automatic Model Updates: WatchFilePattern automatically detects model changes in Cloud Storage, leading to seamless updates without managing endpoint deployments.

pikachu007 · Answer

Low Latency:

Serverless Execution: Cloud Functions start up almost instantly, reducing prediction latency compared to alternatives that require longer setup or deployment times.
In-Memory Model: Loading the model into memory eliminates disk I/O overhead, further contributing to rapid predictions.

Yan_X · Answer

A for me.

pinimichele01 · Answer

agree with guilhermebutzke

PhilipKoku · Answer

C) Expose the model as Vertex AI End Point

Professional Machine Learning Engineer Exam - Question 219

Discussion