Professional Machine Learning Engineer Exam - Question 73

Question

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

Examice · Accepted Answer

A streamlined and reliable approach for validating models before pushing to production in dynamic environments like footwear retail requires handling specific subsets of data for evaluation. The TFX ModelValidator tools are designed to specify detailed performance metrics and are capable of assessing model readiness for production by evaluating performance on predefined subsets of data. This allows for a thorough validation process that avoids potential data leakage issues associated with methods like k-fold cross validation, which can be problematic in time-series data. Using recent data alone may not provide a comprehensive assessment of the model's performance across various scenarios, and relying solely on the AUC ROC on the entire dataset may miss important subset-specific performance nuances.

John_Pongthorn · Answer

https://www.tensorflow.org/tfx/guide/evaluator

hiromi · Answer

it's seem C for me
B is wrong cuz "Many machine learning techniques don’t work well here due to the sequential nature and temporal correlation of time series. For example, k-fold cross validation can cause data leakage; models need to be retrained to generate new forecasts" 
- https://cloud.google.com/learn/what-is-time-series

edoo · Answer

I prefer A to C because 1 week of data may be insufficient to generalize the model and could lead to overfitting on the validation subset.

M25 · Answer

Went with A

atlas_lyon · Answer

I will go for A. I don't think the aim of the question is to test if the candidates know whether or not a component is deprecated . Note that ModelValidator has been fused with Evaluator. So we can imagine, the question would have been updated in recent exams. Evaluator enables testing on specific subsets with the metrics we want, then indicates to Pusher component to push the new model to production if "model is good enough". This would make the pipeline quite streamlined (https://www.tensorflow.org/tfx/guide/evaluator)

B: wrong: using historical data, one should watch data leakage
C: wrong: We want to track performance on specific subsets of data (not necessarily the last week) maybe to do some targeting/segmentation ? who knows. 
D: wrong because we want to track performance on specific subsets of data not the entire dataset

pinimichele01 · Answer

The Evaluator TFX pipeline component performs deep analysis on the training results for your models, to help you understand how your model performs on subsets of your data.

gscharly · Answer

Evaluator TFX lets you evaluate the performance on different subsets of data https://www.tensorflow.org/tfx/guide/evaluator

aw_49 · Answer

A is deprecated.. so C

julliet · Answer

Could someone explain why A is better option than C? C is correct one in terms of evaluation overall, no doubt. But do we choose TFX because it understands we are dealing with time series? Or is it the "specific subset" in the Q that makes us thinking we have already chosen the data of last period and just need to push it into the TFX?

Voyager2 · Answer

I think that it should be C for the following key point
", but track your performance on specific subsets of data before pushing to production"
So the ask is which subset of data you should use.

Liting · Answer

Went with C

[Removed] · Answer

The answer is A. Performance on specific subsets of data before pushing to production == TFX ModelValidator with custom performance metrics for production readiness.

C is wrong because performance in the last relevant week of data != performance on specific subsets of data.

joaquinmenendez · Answer

Option C,  because it allows you to track your model's performance on the most *recent* data, which is the most relevant data for predicting stockout risk. Given that the preferences are dynamic, the most important thing is that the model WORKS correctly with the newest data

AdiML · Answer

Answer should be C, we are dealing with dynamic data and the "last" data is more relevant to have an idea about the future performance

Mickey321 · Answer

Either A or C but C is only last week which is not specific data sets

pmle_nintendo · Answer

option C provides a streamlined and reliable approach that focuses on evaluating the model's performance on the most relevant and recent data, which is essential for predicting out-of-stock events in a dynamic retail setting.

PhilipKoku · Answer

A) TFX ModelValidator is designed to handle the exact needs described in the scenario: training on all data, validating on specific subsets, and ensuring production readiness with comprehensive performance metrics.
This makes it the most streamlined and reliable method compared to other options, which either lack specificity in production readiness (B), are too narrow in scope (C), or risk overfitting and inadequate validation (D).

Professional Machine Learning Engineer Exam - Question 73

Discussion