Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 74


You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What should you do?

Show Answer
Correct Answer: B

During an 'Out of Memory' error in an online prediction request, the error suggests that the data being sent in each request is too large and exceeds the available memory. To address this, sending the request again with a smaller batch of instances can reduce the amount of data processed at a time, potentially avoiding the out-of-memory error and successfully completing the prediction request.

Discussion

10 comments
Sign in to comment
hiromiOption: B
Dec 18, 2022

B is the answer 429 - Out of Memory https://cloud.google.com/ai-platform/training/docs/troubleshooting

tavva_prudhvi
Mar 20, 2023

Upvote this comment, its the right answer!

koakandeOption: B
Dec 29, 2022

https://cloud.google.com/ai-platform/training/docs/troubleshooting

tavva_prudhviOption: B
Mar 20, 2023

B. Send the request again with a smaller batch of instances. If you are getting an "Out of Memory" error during an online prediction request, it suggests that the amount of data you are sending in each request is too large and is exceeding the available memory. To resolve this issue, you can try sending the request again with a smaller batch of instances. This reduces the amount of data being sent in each request and helps avoid the out-of-memory error. If the problem persists, you can also try increasing the machine type or the number of instances to provide more resources for the prediction service.

Sivaram06Option: B
Dec 11, 2022

https://cloud.google.com/ai-platform/training/docs/troubleshooting#http_status_codes

LearnSodasOption: B
Dec 11, 2022

answer B as reported here: https://cloud.google.com/ai-platform/training/docs/troubleshooting

ares81Option: B
Dec 11, 2022

The correct answer is B.

BenMSOption: C
Feb 28, 2023

This question is about prediction not training - and specifically it's about _online_ prediction (aka realtime serving). All the answers are about batch workloads apart from C.

BenMS
Feb 28, 2023

Okay, option D is also about online serving, but the error message indicates a problem for individual predictions, which will not be fixed by increasing the number of predictions per second.

Antmal
Mar 30, 2023

@BenMS this feels like a trick question.... makes on to zone to the word batch. https://cloud.google.com/ai-platform/training/docs/troubleshooting .... states then when an error occurs with an online prediction request, you usually get an HTTP status code back from the service. These are some commonly encountered codes and their meaning in the context of online prediction: 429 - Out of Memory The processing node ran out of memory while running your model. There is no way to increase the memory allocated to prediction nodes at this time. You can try these things to get your model to run: Reduce your model size by: 1. Using less precise variables. 2. Quantizing your continuous data. 3. Reducing the size of other input features (using smaller vocab sizes, for example). 4. Send the request again with a smaller batch of instances.

M25Option: B
May 9, 2023

Went with B

pmle_nintendoOption: B
Feb 28, 2024

By reducing the batch size of instances sent for prediction, you decrease the memory footprint of each request, potentially alleviating the out-of-memory issue. However, be mindful that excessively reducing the batch size might impact the efficiency of your prediction process.

PhilipKokuOption: B
Jun 7, 2024

B) Use smaller set of tokens