Professional Machine Learning Engineer Exam - Question 170

Question

You need to deploy a scikit-leam classification model to production. The model must be able to serve requests 24/7, and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment. What should you do?

Examice · Accepted Answer

To handle millions of requests per second, a single replica would not suffice, as it would not be able to manage such a vast load. Therefore, setting the max replica count to 1 would likely result in extremely high latency or service disruptions. On the other hand, deploying an online Vertex AI prediction endpoint with a max replica count of 100 ensures that the system can scale to meet the high-demand periods efficiently. While this will incur higher costs during peak hours, it is necessary to meet the performance requirements. Additionally, you can dynamically scale down during off-peak hours to minimize costs, leveraging the scalability feature of Vertex AI.

pikachu007 · Answer

B. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100:
This option provides a higher number of replicas (100) to handle the expected high volume of requests during peak hours. While it might result in increased costs, it provides the necessary scalability to manage the incoming traffic efficiently. During non-peak hours, you can consider scaling down the replicas to reduce costs, as Vertex AI allows dynamic scaling based on demand.

BlehMaks · Answer

scikit-learn doesn't support GPU
https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support

36bdc1e · Answer

B
we don't need GPU for scikit-learn

b1a8fae · Answer

B.
scikit-learn -> no need for GPU
max number of replicas -> 1 is too little if we are serving online predictions at such a massive scale (millions per second)

pinimichele01 · Answer

see pikachu007

AzureDP900 · Answer

Option A (Deploying an online Vertex AI prediction endpoint. Set the max replica count to 1) is still a good choice for minimizing costs. By setting the max replica count to 1, you are allowing Vertex AI to scale up or down based on load, which means that during off-peak hours, you won't be paying for unnecessary instances.

Professional Machine Learning Engineer Exam - Question 170

Discussion