Exam Professional Data Engineer All QuestionsBrowse all questions from this exam
Question 54

Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

    Correct Answer: D

    To determine which user bid first in real-time in a globally distributed auction application, leveraging Google Cloud Pub/Sub and Google Cloud Dataflow is the most effective approach. Each application server should write the bid events to Cloud Pub/Sub as they occur. Then, using a pull subscription to pull the bid events into Google Cloud Dataflow ensures that the events are processed in real-time. Cloud Dataflow can handle the real-time ingestion and processing of bid events, allowing you to write logic to determine which user bid first based on the event timestamps. This setup supports scalability and real-time processing, ensuring that bid events are centrally collected and processed efficiently, which is crucial for a global auction system.

Discussion
jvg637Option: B

I'd go with B: real-time is requested, and the only scenario for real time (in the 4 presented) is the use of pub/sub with push.

[Removed]

i would go with option B, Cause option D states "Give the bid for each item to the user in the bid event that is processed first" . The requirement is to get the first bid based on event time not processed first in dataflow.

Tanzu

B. - for realtime pub/sub push is critical. pull creates latency. (eliminates D) - process by event-time, not by process -time (eliminates D)

godot

no push avail: https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub#streaming-pull-migration

jin0

The dataflow is designed for realtime processing. and this case should be needed to use dataflow because there is no option to order the data if not using dataflow. So D is answer I think

AzureDP900

Agree with B

donbigi

This approach is not ideal because it requires a custom endpoint to write the bid event information into Cloud SQL. This adds additional complexity and potential points of failure to the architecture, as well as adding latency to the processing of bid events, since the data must be written to both Pub/Sub and Cloud SQL. Additionally, it can be more challenging to ensure that bid events are processed in the order they were received, since the data is being written to multiple databases. Finally, using a single database to store bid events could limit scalability and availability, and can also result in slow query performance.

GanshankOption: D

D The need is to collate the messages in real-time. We need to de-dupe the messages based on timestamp of when the event occurred. This can be done by publishing ot Pub-Sub and consuming via Dataflow.

Tanzu

Yeap, that's why B is the right one. It has pub/sub push, more real time than pub/sub pull. You need to aware at some point , something has to be pulled which adds a latency.

unnamed12355

D isnt correct, Pub/sub can send messages out of order, it is no guaranty that the event with lowest timestamp will be processed first B is correct

I__SHA1234567Option: D

Google Cloud Pub/Sub is a scalable and reliable messaging service that can handle high volumes of data and deliver messages in real-time. By having each application server publish bid events to Cloud Pub/Sub, you ensure that all bid events are collected centrally. Using Cloud Dataflow with a pull subscription allows you to process the bid events in real-time. Cloud Dataflow provides a managed service for stream and batch processing, and it can handle the real-time processing requirements efficiently. By processing the bid events with Cloud Dataflow, you can determine which user bid first by applying the appropriate logic within your Dataflow pipeline. This approach ensures scalability, reliability, and real-time processing capabilities, making it suitable for handling bid events from multiple application servers.

ZepopoOption: B

key words is "single location in real time"

rocky48Option: D

Answer : D We need to de-dupe the messages based on timestamp of when the event occurred. This can be done by publishing ot Pub-Sub and consuming via Dataflow. D sounds like a complete answer. B does not.

NircaOption: B

B. Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL. is correct

DeepakVenkatachalamOption: B

Correct Answer is B. option D is based on processing first and not based on event first. so option D cannot be right answer

manel_bhsOption: D

While using Cloud Pub/Sub for real-time event streaming is a good choice, pushing events to a custom endpoint that writes to Cloud SQL introduces additional complexity. Custom endpoints need to be maintained, and the process of writing to Cloud SQL might not be as efficient as using a purpose-built data processing service.

SnnnnneeeOption: B

In D the user gets it where the data is ingested first. That can be wrong for a global auction solution

yassoraa88Option: D

This is the most suitable solution for the requirements. Google Cloud Pub/Sub can handle high throughput and low-latency data ingestion. Coupled with Google Cloud Dataflow, which can process data streams in real time, this setup allows for immediate processing of bid events. Dataflow can also handle ordering and timestamp extraction, crucial for determining which bid came first. This architecture supports scalability and real-time analytics, which are essential for a global auction system.

teka112233Option: D

the Answer should be D for the following Real-time Processing Centralized Processing Winner Determination also, B is unsuitable as While Pub/Sub can ingest data, Cloud SQL is a relational database not designed for real-time processing at this scale. Maintaining a custom endpoint adds complexity.

philli1011Option: B

B should be the answer, because it writes the bid into Cloud SQL to a distributed system. This way the customer know if they get the bid or not, immediately. Also, push requests are faster than pull requests, hence they are better for realtime experience.

arpana_naaOption: D

pub/sub for entry time stamp + event time dataflow for processing and dataflow is better for real time

NandababyOption: B

To accurately determine who bid first in a globally distributed auction application, utilizing a push mechanism instead of a pull mechanism is generally considered the more reliable approach. B should be correct answer.

Nivea007Option: D

D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud. This approach leverages Google Cloud Pub/Sub for real-time data ingestion and Google Cloud Dataflow for real-time data processing, ensuring that bids are processed as they occur, which aligns with real-time requirements. It's not B because there is a step involving a custom endpoint that writes data into Cloud SQL. This additional step could introduce some latency, and it's important to ensure that the custom endpoint and Cloud SQL database can handle the real-time load effectively.

patiwwb

But D treats the bids according to the processed time. We need to consider event time that's why B is the right answer.

imran79Option: D

D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.

np717Option: D

D is the best solution because it is both real-time and scalable. Google Cloud Dataflow can process the bid events in the order in which they occurred and give the bid for each item to the user in the bid event that is processed first.