Professional Machine Learning Engineer Exam - Question 29

Question

You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on AI Platform for high-throughput online prediction. Which architecture should you use?

Examice · Accepted Answer

For high-throughput online prediction, it is essential to have a scalable architecture that can handle large volumes of data with efficient preprocessing. Sending incoming prediction requests to a Pub/Sub topic and using a Dataflow job to transform the data allows for scalable and parallel processing. Dataflow is designed to handle large-scale data processing and can transform the incoming data efficiently. Once the data is transformed, it is submitted to AI Platform for prediction, and the results can be written to an outbound Pub/Sub queue. This approach ensures that preprocessing is handled efficiently and scales well with high volumes of data. Options involving Cloud Functions (D) may face resource limitations with high computational preprocessing, making them less suitable for this use case.

SparkExpedition · Answer

Supporting B ..https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing

inder0007 · Answer

I think it should b B

e707 · Answer

I think it's D as B is not a good choice because it requires you to run a Dataflow job for each prediction request. This is inefficient and can lead to latency issues.

Liting · Answer

Went with B, using dataflow for large amount data transformation is the best option

Mohamed_Mossad · Answer

- using options eliminatios , A totally wrong , D also not valid as cloud functions is not sutiable for heavy data workflows
- answer between B,D will vote for B as dataflow is the best solution while dealing with heavy data workflows

SamuelTsch · Answer

I went to B. 
A is completely wrong. C: 1st cloud spanner is not designed for high throughput, also it is not for preprocessing. D: cloud function could not be get enough resource to do the high computational transformation.

sachinxshrivastav · Answer

Answer should be  B

suresh_vn · Answer

Should be B. Dataflow is BEST option for preprocessing training , testing data both

hiromi · Answer

B
Pubsub + DataFlow + Vertex AI (AI Platform)

MithunDesai · Answer

yes ans B

SergioRubiano · Answer

It's B

lucaluca1982 · Answer

I go for D. Option B has Dataflow that it is more suitable for batch

M25 · Answer

Went with B

Voyager2 · Answer

B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue
https://dataintegration.info/building-streaming-data-pipelines-on-google-cloud

ashu381 · Answer

Because the concern here is high throughput and not specifically the latency so better to go with option B

PhilipKoku · Answer

B) Pub/Sub + Dataflow

bludw · Answer

D. The issue with B is that DataFlow does not work well with high throughput

Professional Machine Learning Engineer Exam - Question 29

Discussion