Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 29


You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on AI Platform for high-throughput online prediction. Which architecture should you use?

Show Answer
Correct Answer: BD

For high-throughput online prediction, it is essential to have a scalable architecture that can handle large volumes of data with efficient preprocessing. Sending incoming prediction requests to a Pub/Sub topic and using a Dataflow job to transform the data allows for scalable and parallel processing. Dataflow is designed to handle large-scale data processing and can transform the incoming data efficiently. Once the data is transformed, it is submitted to AI Platform for prediction, and the results can be written to an outbound Pub/Sub queue. This approach ensures that preprocessing is handled efficiently and scales well with high volumes of data. Options involving Cloud Functions (D) may face resource limitations with high computational preprocessing, making them less suitable for this use case.

Discussion

17 comments
Sign in to comment
SparkExpeditionOption: B
Jul 14, 2021

Supporting B ..https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing

inder0007Option: B
Jun 9, 2021

I think it should b B

q4exam
Sep 22, 2021

I also agree with B, this is how I would advise clients to do it as well

e707Option: D
Apr 27, 2023

I think it's D as B is not a good choice because it requires you to run a Dataflow job for each prediction request. This is inefficient and can lead to latency issues.

lucaluca1982
Apr 28, 2023

Yes i agree Dataflow can introduce latency

LitingOption: B
Jul 7, 2023

Went with B, using dataflow for large amount data transformation is the best option

Mohamed_MossadOption: B
Jun 13, 2022

- using options eliminatios , A totally wrong , D also not valid as cloud functions is not sutiable for heavy data workflows - answer between B,D will vote for B as dataflow is the best solution while dealing with heavy data workflows

SamuelTschOption: B
Jul 7, 2023

I went to B. A is completely wrong. C: 1st cloud spanner is not designed for high throughput, also it is not for preprocessing. D: cloud function could not be get enough resource to do the high computational transformation.

sachinxshrivastavOption: B
Aug 6, 2022

Answer should be B

suresh_vnOption: B
Aug 10, 2022

Should be B. Dataflow is BEST option for preprocessing training , testing data both

hiromiOption: B
Dec 8, 2022

B Pubsub + DataFlow + Vertex AI (AI Platform)

MithunDesaiOption: B
Dec 19, 2022

yes ans B

SergioRubianoOption: B
Mar 24, 2023

It's B

lucaluca1982Option: D
Apr 13, 2023

I go for D. Option B has Dataflow that it is more suitable for batch

M25Option: B
May 9, 2023

Went with B

Voyager2Option: D
May 30, 2023

B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue https://dataintegration.info/building-streaming-data-pipelines-on-google-cloud

ashu381Option: B
Jun 10, 2023

Because the concern here is high throughput and not specifically the latency so better to go with option B

PhilipKokuOption: B
Jun 6, 2024

B) Pub/Sub + Dataflow

bludwOption: D
Jun 27, 2024

D. The issue with B is that DataFlow does not work well with high throughput