Professional Data Engineer Exam - Question 204

Question

You want to create a machine learning model using BigQuery ML and create an endpoint for hosting the model using Vertex AI. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

Examice · Accepted Answer

To enable the processing of continuous streaming data in near-real time from multiple vendors and handle potential invalid values, the best approach is to create a Pub/Sub topic and send all vendor data to it. Dataflow can be used to process and sanitize this data before streaming it to BigQuery. This pipeline allows for scalable, real-time processing and ensures data is appropriately cleaned before being utilized by the BigQuery ML model and hosted in Vertex AI.

Atnafu · Answer

Answer is D

vidts · Answer

It's D

jkhong · Answer

Better to use pubsub for streaming and reading message data

Dataflow ParDo can perform filtering of data

odacir · Answer

D is the best option to sanitize the data to its D

vamgcp · Answer

Option D -Dataflow provides a scalable and flexible way to process and clean the incoming data in real-time before loading it into BigQuery.

zellck · Answer

D is the answer.

AzureDP900 · Answer

D. Create a Pub/Sub topic and send all vendor data to it. Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.

Matt_108 · Answer

Option D

anyone_99 · Answer

Why is the answer A? After paying $44 I am getting wrong answers.

Professional Data Engineer Exam - Question 204

Discussion