Professional Machine Learning Engineer Exam - Question 256

Question

You work for an online grocery store. You recently developed a custom ML model that recommends a recipe when a user arrives at the website. You chose the machine type on the Vertex AI endpoint to optimize costs by using the queries per second (QPS) that the model can serve, and you deployed it on a single machine with 8 vCPUs and no accelerators.

A holiday season is approaching and you anticipate four times more traffic during this time than the typical daily traffic. You need to ensure that the model can scale efficiently to the increased demand. What should you do?

Examice · Accepted Answer

To handle increased traffic efficiently, configure the endpoint with autoscaling capabilities based on vCPU usage. This setup allows the system to automatically adjust compute resources according to demand, ensuring that performance can scale seamlessly during peak times like the holiday season. Additionally, maintaining the current machine type avoids unnecessary upfront costs and prevents potential performance issues that might arise from switching to a different configuration. Monitoring and alerting on CPU usage further ensures that any issues can be promptly identified and addressed, ensuring continued optimal performance.

fitri001 · Answer

Option A: Manually adding compute nodes after an alert might lead to delays and potential outages during peak traffic.
Option B: Upgrading to 32 vCPUs upfront might be an overkill if the current machine type with 8 vCPUs can handle the typical daily traffic. Vertical scaling (more vCPUs) might be suitable only if the model can benefit from additional CPU power.
Option D: Using a GPU is unlikely to benefit a recipe recommendation model, which likely doesn't involve intensive graphical processing. Additionally, monitoring GPU usage wouldn't be relevant.

emsherff · Answer

Option A is manual intervention 
Option B is overprovisioning preemptively, which is an overkill ( autoscaling should be preferred) 
Option D - Unless the recipe recommendation model uses GPU-accelerated computations (e.g., some deep learning models), adding a GPU won't be beneficial and will increase costs.
I would go with C - Autoscaling based on vCPU usage which aligns well with the workload.

kalle_balle · Answer

Voting for B as it's the only option to autoscale even though the cost will go up.  All other options include manual intervention.

pikachu007 · Answer

Cost Optimization: It starts with the current machine type, avoiding unnecessary upfront costs, and scales only when needed.
Autoscaling: It automatically adjusts compute resources based on vCPU usage, ensuring the endpoint can handle traffic spikes without manual intervention.
Monitoring and Alerting: It provides visibility into resource usage and triggers alerts for potential issues, enabling proactive actions.
Investigation: It encourages investigation of alerts to identify any underlying problems beyond expected traffic growth, ensuring overall system health.

b1a8fae · Answer

I would go for C as it enables autoscaling when exceeding a determined CPU usage threshold.

daidai75 · Answer

Option B can only support exact 4x times traffic, but the requirement is  four times "more", so B is not the best at least for me.

emsherff · Answer

Autoscaling based on vCPU usage aligns well with the workload.

omermahgoub · Answer

C: Use Autoscaling Based on vCPU Usage

AzureDP900 · Answer

C is right because
1)Since you've already optimized your model's deployment on a single machine with 8 vCPUs, it makes sense to maintain the same machine type to avoid any potential performance issues.
2)Enabling autoscaling based on vCPU usage will allow your endpoint to automatically add more machines as needed to handle the increased traffic during the holiday season. This approach is more efficient and cost-effective than scaling up individual machines or adding new machines manually.
3)Monitoring CPU usage with a job and alerting when thresholds are exceeded allows you to detect potential issues before they impact performance.

Professional Machine Learning Engineer Exam - Question 256

Discussion