Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 256


You work for an online grocery store. You recently developed a custom ML model that recommends a recipe when a user arrives at the website. You chose the machine type on the Vertex AI endpoint to optimize costs by using the queries per second (QPS) that the model can serve, and you deployed it on a single machine with 8 vCPUs and no accelerators.

A holiday season is approaching and you anticipate four times more traffic during this time than the typical daily traffic. You need to ensure that the model can scale efficiently to the increased demand. What should you do?

Show Answer
Correct Answer: BC

To handle increased traffic efficiently, configure the endpoint with autoscaling capabilities based on vCPU usage. This setup allows the system to automatically adjust compute resources according to demand, ensuring that performance can scale seamlessly during peak times like the holiday season. Additionally, maintaining the current machine type avoids unnecessary upfront costs and prevents potential performance issues that might arise from switching to a different configuration. Monitoring and alerting on CPU usage further ensures that any issues can be promptly identified and addressed, ensuring continued optimal performance.

Discussion

9 comments
Sign in to comment
fitri001Option: C
Apr 17, 2024

Option A: Manually adding compute nodes after an alert might lead to delays and potential outages during peak traffic. Option B: Upgrading to 32 vCPUs upfront might be an overkill if the current machine type with 8 vCPUs can handle the typical daily traffic. Vertical scaling (more vCPUs) might be suitable only if the model can benefit from additional CPU power. Option D: Using a GPU is unlikely to benefit a recipe recommendation model, which likely doesn't involve intensive graphical processing. Additionally, monitoring GPU usage wouldn't be relevant.

emsherffOption: C
Apr 10, 2024

Option A is manual intervention Option B is overprovisioning preemptively, which is an overkill ( autoscaling should be preferred) Option D - Unless the recipe recommendation model uses GPU-accelerated computations (e.g., some deep learning models), adding a GPU won't be beneficial and will increase costs. I would go with C - Autoscaling based on vCPU usage which aligns well with the workload.

kalle_balleOption: B
Jan 9, 2024

Voting for B as it's the only option to autoscale even though the cost will go up. All other options include manual intervention.

b1a8fae
Jan 22, 2024

Wouldn't scaling up the vCPUs after receiving the alert also be manual? It comes across as such to me at least.

pikachu007Option: C
Jan 13, 2024

Cost Optimization: It starts with the current machine type, avoiding unnecessary upfront costs, and scales only when needed. Autoscaling: It automatically adjusts compute resources based on vCPU usage, ensuring the endpoint can handle traffic spikes without manual intervention. Monitoring and Alerting: It provides visibility into resource usage and triggers alerts for potential issues, enabling proactive actions. Investigation: It encourages investigation of alerts to identify any underlying problems beyond expected traffic growth, ensuring overall system health.

b1a8faeOption: C
Jan 22, 2024

I would go for C as it enables autoscaling when exceeding a determined CPU usage threshold.

daidai75Option: C
Jan 23, 2024

Option B can only support exact 4x times traffic, but the requirement is four times "more", so B is not the best at least for me.

emsherffOption: C
Apr 10, 2024

Autoscaling based on vCPU usage aligns well with the workload.

omermahgoubOption: C
Apr 13, 2024

C: Use Autoscaling Based on vCPU Usage

AzureDP900Option: C
Jul 5, 2024

C is right because 1)Since you've already optimized your model's deployment on a single machine with 8 vCPUs, it makes sense to maintain the same machine type to avoid any potential performance issues. 2)Enabling autoscaling based on vCPU usage will allow your endpoint to automatically add more machines as needed to handle the increased traffic during the holiday season. This approach is more efficient and cost-effective than scaling up individual machines or adding new machines manually. 3)Monitoring CPU usage with a job and alerting when thresholds are exceeded allows you to detect potential issues before they impact performance.