Professional Machine Learning Engineer Exam - Question 185

Question

You have developed a BigQuery ML model that predicts customer chum, and deployed the model to Vertex AI Endpoints. You want to automate the retraining of your model by using minimal additional code when model feature values change. You also want to minimize the number of times that your model is retrained to reduce training costs. What should you do?

Examice · Accepted Answer

To automate the retraining of your BigQuery ML model efficiently, create a Vertex AI Model Monitoring job configured to monitor training/serving skew. This setup will detect discrepancies between the training data and the data your model processes in production, ensuring that your model is retrained only when necessary. Configuring alert monitoring to publish messages to a Pub/Sub queue when a skew alert is detected, and using a Cloud Function to monitor the queue and trigger retraining in BigQuery, helps minimize additional code and reduces training costs by focusing on relevancy and need-based retraining.

guilhermebutzke · Answer

My answer: D

Given the emphasis on "model feature values change" in the question, the most suitable option would be D.

Although option C involves monitoring prediction drift, which may indirectly capture changes in feature values, option D directly addresses the need to monitor training/serving skew. By detecting discrepancies between the training and serving data distributions, option D is more aligned with the requirement to automate retraining when model feature values change. Therefore, option D is the most suitable choice in this context.

b1a8fae · Answer

I would avoid using TensorFlow validation to minimize code written. That leaves us with options C and D. Now, since it is the values of the features that we want to flag and not the value of the predictions, this sounds more like training-serving skew situation than prediction drift. Hence, I would go for D.

CHARLIE2108 · Answer

changed my mind it's D

bobjr · Answer

Skew should be detected at the beginning of the productionalisation of the model -> skew test the training data Vs the real data -> a skew indicates you trained in a dataset that is not alined with your data that you have in input

Drift is used when the model works well at the beginning, but the world change and the data input changes -> drift is more long term

here it is a drift issue

vale_76_na_xxx · Answer

I go with : C. 
1. Create a Vertex AI Model Monitoring job configured to monitor prediction drift - > if the modle is already in production we have to considet Prediction drift
2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected -> set Pub/Sub notification channels.
3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery -> to eimport new data in BQ

36bdc1e · Answer

C
The best option for automating the retraining of your model by using minimal additional code when model feature values change, and minimizing the number of times that your model is retrained to reduce training costs, is to create a Vertex AI Model Monitoring job configured to monitor prediction drift, configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected, and use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in
BigQuery. This option allows you to leverage the power and simplicity of Vertex AI, Pub/Sub, and Cloud Functions to monitor your model performance and retrain your model when needed. Vertex AI is a unified platform for building and deploying machine learning solutions on Google Cloud.

b1a8fae · Answer

After reconsidering, I think it is C:
- No need to use TF to enable model monitoring as stated here: https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring
(even if it uses it under the hood: https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#calculating-skew-and-drift)

- The problem speaks about alerting of model feature changes, which happens over time, and uses a baseline of the historical production data -> prediction skew. (if the problem specified that it changes compared to training data, then it would be training-skew) (https://cloud.google.com/vertex-ai/docs/model-monitoring/monitor-explainable-ai#feature_attribution_training-serving_skew_and_prediction_drift)

ddogg · Answer

Option C:

This option directly addresses your requirements:
Vertex AI Model Monitoring: It allows efficient monitoring of prediction drift through metrics like Mean Squared Error or AUC-ROC.
Pub/Sub alerts: Alert triggers notification upon significant drift, minimizing unnecessary retraining.
Cloud Function: It reacts to Pub/Sub messages and triggers retraining in BigQuery using minimal additional code.

CHARLIE2108 · Answer

I go with C but D is pretty similar.

C -> Prediction drift (When the overall distribution of predictions changes significantly between training and serving data).

D -> Training/serving skew (When the distribution of specific features between training and serving data differs significantly).

pikachu007 · Answer

A and B: TensorFlow Data Validation jobs require more setup and maintenance, and they might not integrate as seamlessly with Vertex AI Endpoints for automated retraining.
D: Monitoring training/serving skew focuses on differences between training and deployment environments, which might not directly address feature value changes.

BlehMaks · Answer

we might need to retrain if the feature data distribution in the production and training are significantly different(training/serving skew). Prediction drift occurs when feature data distribution in production changes significantly over time. Should we retrain our model every time when we meet prediction drift? I dont think so, better to analyze why this drift happens.
https://cloud.google.com/vertex-ai/docs/model-monitoring/overview#considerations

BlehMaks · Answer

i've changed my mind) it's D
https://www.evidentlyai.com/blog/machine-learning-monitoring-data-and-concept-drift

pinimichele01 · Answer

It's D

gscharly · Answer

I go with D

Shno · Answer

if the model training is done through bigquery ML, we don't have access to the training data after export, so I don't understand how training/serving skew can be applied. Can someone who is voting in favour of D clarify?

Professional Machine Learning Engineer Exam - Question 185

Discussion