Professional Machine Learning Engineer Exam - Question 218

Question

You have built a custom model that performs several memory-intensive preprocessing tasks before it makes a prediction. You deployed the model to a Vertex AI endpoint, and validated that results were received in a reasonable amount of time. After routing user traffic to the endpoint, you discover that the endpoint does not autoscale as expected when receiving multiple requests. What should you do?

Examice · Accepted Answer

To address the issue of the endpoint not autoscaling as expected under multiple requests, you should decrease the CPU utilization target in the autoscaling configurations. This will cause the autoscaling mechanism to react more aggressively to lower CPU usage levels, potentially leading to scaling actions sooner. This approach will help handle the increased load during periods of high traffic, even though the primary bottleneck is memory utilization rather than CPU utilization.

b1a8fae · Answer

D. 
The idea behind this question is getting autoscaling to handle well the fluctuating input of requests. Changing the machine (A) is not related to autoscaling, and you might not be using the full potential of the machine during the whole time, bur rather only during instances of peak traffic. You need to lower the autoscaling threshold (the target utilization metric mentioned in the options is CPU, so we will go with this) so you make use of more resources whenever too many memory-intensive requests are happening.

https://cloud.google.com/compute/docs/autoscaler/scaling-cpu#scaling_based_on_cpu_utilization
https://cloud.google.com/compute/docs/autoscaler#autoscaling_policy

pikachu007 · Answer

B. Decreasing Workers: This might reduce memory usage per machine but could also decrease overall throughput, potentially impacting performance.
C. Increasing CPU Utilization Target: This wouldn't directly address the memory bottleneck and could trigger unnecessary scaling based on CPU usage, not memory requirements.
D. Decreasing CPU Utilization Target: This could lead to premature scaling, potentially increasing costs without addressing the root cause.

guilhermebutzke · Answer

Option D, "Decrease the CPU utilization target in the autoscaling configurations," could be a valid approach to address the issue of autoscaling and anticipate spikes in traffic. By lowering the threshold, the autoscaling system would initiate scaling actions at a lower CPU utilization level, allowing for a more proactive response to increasing demands.

fitri001 · Answer

D. Decrease the CPU utilization target: This is the most suitable approach. By lowering the CPU utilization target, the endpoint will scale up at a lower CPU usage level. This increases the likelihood of scaling up when the memory-intensive preprocessing tasks cause a rise in CPU utilization, even though memory is the root cause.

VinaoSilva · Answer

"use autoscale" = deacrease cpu utilization target

Professional Machine Learning Engineer Exam - Question 218

Discussion