Professional Data Engineer Exam - Question 54

Question

Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

Examice · Accepted Answer

To determine which user bid first in real-time in a globally distributed auction application, leveraging Google Cloud Pub/Sub and Google Cloud Dataflow is the most effective approach. Each application server should write the bid events to Cloud Pub/Sub as they occur. Then, using a pull subscription to pull the bid events into Google Cloud Dataflow ensures that the events are processed in real-time. Cloud Dataflow can handle the real-time ingestion and processing of bid events, allowing you to write logic to determine which user bid first based on the event timestamps. This setup supports scalability and real-time processing, ensuring that bid events are centrally collected and processed efficiently, which is crucial for a global auction system.

jvg637 · Answer

I'd go with B: real-time is requested, and the only scenario for real time (in the 4 presented) is the use of pub/sub with push.

Ganshank · Answer

D
The need is to collate the messages in real-time. We need to de-dupe the messages based on timestamp of when the event occurred. This can be done by publishing ot Pub-Sub and consuming via Dataflow.

DeepakVenkatachalam · Answer

Correct Answer is B. option D is based on processing first and not based on event first. so option D cannot be right answer

Nirca · Answer

B. Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.  is correct

rocky48 · Answer

Answer : D
We need to de-dupe the messages based on timestamp of when the event occurred. This can be done by publishing ot Pub-Sub and consuming via Dataflow.
D sounds like a complete answer. B does not.

Zepopo · Answer

key words is "single location in real time"

I__SHA1234567 · Answer

Google Cloud Pub/Sub is a scalable and reliable messaging service that can handle high volumes of data and deliver messages in real-time. By having each application server publish bid events to Cloud Pub/Sub, you ensure that all bid events are collected centrally.

Using Cloud Dataflow with a pull subscription allows you to process the bid events in real-time. Cloud Dataflow provides a managed service for stream and batch processing, and it can handle the real-time processing requirements efficiently.

By processing the bid events with Cloud Dataflow, you can determine which user bid first by applying the appropriate logic within your Dataflow pipeline. This approach ensures scalability, reliability, and real-time processing capabilities, making it suitable for handling bid events from multiple application servers.

np717 · Answer

D is the best solution because it is both real-time and scalable. Google Cloud Dataflow can process the bid events in the order in which they occurred and give the bid for each item to the user in the bid event that is processed first.

imran79 · Answer

D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.

Nivea007 · Answer

D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud.
This approach leverages Google Cloud Pub/Sub for real-time data ingestion and Google Cloud Dataflow for real-time data processing, ensuring that bids are processed as they occur, which aligns with real-time requirements.

It's not B because there is a step involving a custom endpoint that writes data into Cloud SQL. This additional step could introduce some latency, and it's important to ensure that the custom endpoint and Cloud SQL database can handle the real-time load effectively.

Nandababy · Answer

To accurately determine who bid first in a globally distributed auction application, utilizing a push mechanism instead of a pull mechanism is generally considered the more reliable approach. 
B should be correct answer.

arpana_naa · Answer

pub/sub for entry time stamp + event time 
dataflow for processing and dataflow is better for real time

philli1011 · Answer

B should be the answer, because it writes the bid into Cloud SQL to a distributed system. This way the customer know if they get the bid or not, immediately.
Also, push requests are faster than pull requests, hence they are better for realtime experience.

teka112233 · Answer

the Answer should be D for the following 
Real-time Processing
Centralized Processing
Winner Determination
also, B is unsuitable as While Pub/Sub can ingest data, Cloud SQL is a relational database not designed for real-time processing at this scale. Maintaining a custom endpoint adds complexity.

yassoraa88 · Answer

This is the most suitable solution for the requirements. Google Cloud Pub/Sub can handle high throughput and low-latency data ingestion. Coupled with Google Cloud Dataflow, which can process data streams in real time, this setup allows for immediate processing of bid events. Dataflow can also handle ordering and timestamp extraction, crucial for determining which bid came first. This architecture supports scalability and real-time analytics, which are essential for a global auction system.

Snnnnneee · Answer

In D the user gets it where the data is ingested first. That can be wrong for a global auction solution

manel_bhs · Answer

While using Cloud Pub/Sub for real-time event streaming is a good choice, pushing events to a custom endpoint that writes to Cloud SQL introduces additional complexity.
Custom endpoints need to be maintained, and the process of writing to Cloud SQL might not be as efficient as using a purpose-built data processing service.

Professional Data Engineer Exam - Question 54

Discussion