DEA-C01 Exam - Question 118

Question

A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company’s application uses the PutRecord action to send data to Kinesis Data Streams.

A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline.

Which solution will meet this requirement?

Examice · Accepted Answer

To achieve exactly-once delivery in the entire processing pipeline, the application should be designed to handle duplicates by embedding a unique ID in each record at the source. This ensures that even if a record is ingested multiple times due to network outages or retries, duplicate records can be identified and discarded during processing. This method maintains exactly-once processing semantics effectively in distributed systems.

bakarys · Answer

A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.

This approach ensures that even if a record is sent more than once due to network outages or other issues, it will only be processed once because the unique ID can be used to identify and remove any duplicates. This is a common pattern for achieving exactly-once processing semantics in distributed systems. The other options do not guarantee exactly-once delivery across the entire pipeline. Option B is partially correct but it only avoids duplicate processing within the Amazon Managed Service for Apache Flink, not across the entire pipeline. Option C is not always feasible because network issues and other factors can lead to events being ingested into Kinesis Data Streams multiple times. Option D involves changing the entire technology stack, which is not necessary to achieve the desired outcome and could introduce additional complexity and cost.

Ja13 · Answer

A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.

Explanation:
Exactly-Once Delivery: Ensuring exactly-once delivery is a challenge in distributed systems, especially in the presence of network outages and retries. By embedding a unique ID in each record at the source, you can track and identify duplicate records during processing. This approach allows you to implement idempotent processing, where duplicate records can be detected and discarded, ensuring that each record is processed exactly once.
De-duplication Logic: Implementing de-duplication logic based on unique IDs ensures that even if the same record is ingested multiple times due to retries or network issues, it will be processed only once by the downstream applications.

DEA-C01 Exam - Question 118

Discussion