Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 170


You need to deploy a scikit-leam classification model to production. The model must be able to serve requests 24/7, and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment. What should you do?

Show Answer
Correct Answer: AB

To handle millions of requests per second, a single replica would not suffice, as it would not be able to manage such a vast load. Therefore, setting the max replica count to 1 would likely result in extremely high latency or service disruptions. On the other hand, deploying an online Vertex AI prediction endpoint with a max replica count of 100 ensures that the system can scale to meet the high-demand periods efficiently. While this will incur higher costs during peak hours, it is necessary to meet the performance requirements. Additionally, you can dynamically scale down during off-peak hours to minimize costs, leveraging the scalability feature of Vertex AI.

Discussion

6 comments
Sign in to comment
pikachu007Option: B
Jan 10, 2024

B. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100: This option provides a higher number of replicas (100) to handle the expected high volume of requests during peak hours. While it might result in increased costs, it provides the necessary scalability to manage the incoming traffic efficiently. During non-peak hours, you can consider scaling down the replicas to reduce costs, as Vertex AI allows dynamic scaling based on demand.

BlehMaksOption: B
Jan 12, 2024

scikit-learn doesn't support GPU https://scikit-learn.org/stable/faq.html#will-you-add-gpu-support

36bdc1eOption: B
Jan 13, 2024

B we don't need GPU for scikit-learn

b1a8faeOption: B
Jan 8, 2024

B. scikit-learn -> no need for GPU max number of replicas -> 1 is too little if we are serving online predictions at such a massive scale (millions per second)

pinimichele01Option: B
Apr 13, 2024

see pikachu007

AzureDP900Option: A
Jun 21, 2024

Option A (Deploying an online Vertex AI prediction endpoint. Set the max replica count to 1) is still a good choice for minimizing costs. By setting the max replica count to 1, you are allowing Vertex AI to scale up or down based on load, which means that during off-peak hours, you won't be paying for unnecessary instances.