Professional Machine Learning Engineer Exam - Question 244

Question

Your work for a textile manufacturing company. Your company has hundreds of machines, and each machine has many sensors. Your team used the sensory data to build hundreds of ML models that detect machine anomalies. Models are retrained daily, and you need to deploy these models in a cost-effective way. The models must operate 24/7 without downtime and make sub millisecond predictions. What should you do?

Examice · Accepted Answer

Deploying a Dataflow streaming pipeline with the RunInference API and using automatic model refresh is the most suitable approach for this scenario. This solution ensures continuous real-time processing of sensor data, which is essential for making sub-millisecond predictions and detecting machine anomalies promptly. The RunInference API allows the models to be invoked directly within the pipeline, minimizing latency and eliminating the need for separate prediction endpoints, which can be more cost-effective. Automatic model refresh ensures that the latest retrained models are always in use without downtime, maintaining the accuracy and effectiveness of anomaly detection.

fitri001 · Answer

why D?
Real-time Predictions: Dataflow streaming pipelines continuously process sensor data, enabling real-time anomaly detection with sub-millisecond predictions. This is crucial for immediate response to potential machine issues.
RunInference API: This API allows invoking TensorFlow models directly within the Dataflow pipeline for on-the-fly inference. This eliminates the need for separate prediction endpoints and reduces latency.
Automatic Model Refresh: Since models are retrained daily, automatic refresh ensures the pipeline utilizes the latest version without downtime. This is essential for maintaining model accuracy and anomaly detection effectiveness.
Why not C?
Dataflow Streaming Pipeline with Vertex AI Prediction Endpoint with Autoscaling: While autoscaling can handle varying workloads, Vertex AI Prediction endpoints might incur higher costs for real-time, high-volume predictions compared to invoking models directly within the pipeline using RunInference.

b1a8fae · Answer

Needs to be active 24/7 -> streaming.
RunInference API seems like the way to go here, using automatic model refresh on a daily basis. https://beam.apache.org/documentation/ml/about-ml/

guilhermebutzke · Answer

My Answer: C

The phrase: “The models must operate 24/7 without downtime and make sub millisecond predictions” configures a case of online prediction (option B or C)

The phrase: “Models are retrained daily, and you need to deploy these models in a cost-effective way”, choose between “ Vertex AI Prediction endpoint with autoscaling” instead “Runlnference API, and use automatic model refresh” looks better because always update with retrained models, and the scalability.

https://cloud.google.com/blog/products/ai-machine-learning/streaming-prediction-with-dataflow-and-vertex

sonicclasps · Answer

low latency - > streaming
C & D could both work, but C is the GCP solution. So I chose C

pinimichele01 · Answer

With the automatic model refresh feature, when the underlying model changes, your pipeline updates to use the new model. Because the RunInference transform automatically updates the model handler, you don't need to redeploy the pipeline. With this feature, you can update your model in real time, even while the Apache Beam pipeline is running.

gscharly · Answer

agree with fitri001

Professional Machine Learning Engineer Exam - Question 244

Discussion