Professional Machine Learning Engineer Exam - Question 163

Question

You are an ML engineer at a retail company. You have built a model that predicts a coupon to offer an ecommerce customer at checkout based on the items in their cart. When a customer goes to checkout, your serving pipeline, which is hosted on Google Cloud, joins the customer's existing cart with a row in a BigQuery table that contains the customers' historic purchase behavior and uses that as the model's input. The web team is reporting that your model is returning predictions too slowly to load the coupon offer with the rest of the web page. How should you speed up your model's predictions?

Examice · Accepted Answer

The primary issue here is the latency caused by the on-the-fly joining of the cart data with the historical purchase behavior data stored in BigQuery. Creating a materialized view in BigQuery with the necessary data for predictions can precompute and store the join between the cart data and the relevant historical purchase information. This approach eliminates the need for real-time joins during prediction, significantly reducing latency by making the pre-computed data readily available. This method offers a more efficient and cost-effective solution compared to other options, such as attaching a GPU or deploying more instances behind a load balancer, which are more suitable for computational bottlenecks rather than data retrieval issues.

ddogg · Answer

D. Create a materialized view in BigQuery with the necessary data for predictions.

Here's why:

Current bottleneck: Joining the cart data with the BigQuery table containing historic purchases likely creates the latency bottleneck. Fetching data from BigQuery on every prediction request can be slow.
Materialized view: A materialized view pre-computes and stores the join between the cart data and the relevant historic purchase information in BigQuery. This eliminates the need for real-time joins during prediction, significantly reducing latency.
Faster access: The pre-computed data in the materialized view is readily available within BigQuery, ensuring faster access for your serving pipeline when predicting the coupon offer.
Lower cost: Compared to additional instances or GPU resources, a materialized view can be a more cost-effective solution, especially if prediction requests are frequent.

guilhermebutzke · Answer

I changed my mind.

B: Im read a lot this page

https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction

If the web team is reporting that the model is returning predictions too slowly to load the coupon offer with the rest of the web page, it suggests that the bottleneck might indeed be in the inference process rather than in data retrieval or processing. 
Given that the model is deployed on Google Cloud, choosing a low-latency database makes it suitable for scenarios where quick access to data is crucial, such as real-time predictions for web applications.

Option D: While pre-aggregating data in BigQuery can improve query speed, it might not be as efficient as a low-latency database for frequently accessed data like customer purchase history.

gscharly · Answer

https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction

"Analytical data stores such as BigQuery are not engineered for low-latency singleton read operations, where the result is a single row with many columns."

sonicclasps · Answer

Queries that use materialized views are generally faster and consume fewer resources than queries that retrieve the same data only from the base tables. Materialized views can significantly improve the performance of workloads that have the characteristic of common and repeated queries.

fitri001 · Answer

Reduced Join Cost: Joining the customer's cart with their purchase history in BigQuery during each prediction can be slow. A materialized view pre-computes and stores the join results, eliminating the need for repetitive joins and significantly reducing latency.
Targeted Data Access: Materialized views allow you to specify the exact columns needed for prediction, minimizing data transferred between BigQuery and your serving pipeline.

kalle_balle · Answer

Option B seems most sensible.

guilhermebutzke · Answer

Firstly, I believe the correct choice should be B. This is supported by a comprehensive Google page discussing methods to minimize real-time prediction latency. In this resource, they don't mention using a BigQuery view but instead suggest precomputing and lookup approaches to minimize prediction time.

https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#online_real-time_prediction

However, I will stick with option D because it's not clear whether option B suggests changing the entire database or just utilizing it as a preliminary step for online prediction.

Ria_1989 · Answer

Coupon to offer an ecommerce customer at checkout based on the items in their cart not the customer historic behaviour. That's creating confusion while choosing B.

SausageMuffins · Answer

Both B and D in theory does reduce latency but B implies that we might need to migrate the database to another low latency database. This migration and setup might incur additional costs and effort.

In contrast, creating a materialized view seems much more straight forward since there is already a preexisting big query table mentioned in the question.

andreabrunelli · Answer

In my opinion the materialized view could be the best way but it says that the cart data have to join with historic behaviour so it's impossibile to have all the needed data for the prediction in the materialized view because cart data are not in the database.

Professional Machine Learning Engineer Exam - Question 163

Discussion