Professional Machine Learning Engineer Exam - Question 238

Question

You have deployed a scikit-team model to a Vertex AI endpoint using a custom model server. You enabled autoscaling: however, the deployed model fails to scale beyond one replica, which led to dropped requests. You notice that CPU utilization remains low even during periods of high load. What should you do?

Examice · Accepted Answer

In the given scenario, the model fails to scale beyond one replica, and CPU utilization remains low despite high load. This indicates that the bottleneck is internal to the model server's handling of requests rather than a lack of computational resources. Increasing the number of workers in your model server can enable it to handle more concurrent requests, thereby fully utilizing the CPU resources and addressing the problem without having to add more replicas.

sonicclasps · Answer

"We generally recommend starting with one worker or thread per core. If you notice that CPU utilization is low, especially under high load, or your model is not scaling up because CPU utilization is low, then increase the number of workers."
https://cloud.google.com/vertex-ai/docs/general/deployment

pikachu007 · Answer

Low CPU Utilization: Despite high load, low CPU utilization indicates underutilization of available resources, suggesting a bottleneck within the model server itself, not overall compute capacity.
Worker Concurrency: Increasing the number of workers within the model server allows it to handle more concurrent requests, effectively utilizing available CPU resources and addressing the bottleneck.

Carlose2108 · Answer

I went B

guilhermebutzke · Answer

My answer: C

The problem is in scale. The provided resources areok. So,

A: Not correct, because CPU is enough.

B: Not correct, because increasing the number of workers will accelerate the process in a single replica, and make the time of prediction faster for example, but not will happen in scale problem.

C:Correct: This option involves adjusting the scaling of resources to match the expected demand, ensuring that the system can handle increased loads effectively

D: This might help ensure at least one replica is always available, but it won't address the issue of not scaling up during high load.

pinimichele01 · Answer

agree with sonicclasps -> B

fitri001 · Answer

agree with sonicclasps -> B

Professional Machine Learning Engineer Exam - Question 238

Discussion