DEA-C01 Exam - Question 117

Question

A data engineer is building an automated extract, transform, and load (ETL) ingestion pipeline by using AWS Glue. The pipeline ingests compressed files that are in an Amazon S3 bucket. The ingestion pipeline must support incremental data processing.

Which AWS Glue feature should the data engineer use to meet this requirement?

Examice · Accepted Answer

Job bookmarks in AWS Glue are designed to enable incremental data processing by tracking the state of data that has been processed. They allow the ETL job to resume from where it left off, ensuring that only new or modified data since the last successful run is processed. This feature is essential for building an efficient and reliable automated ETL ingestion pipeline that handles incremental data updates.

Bmaster · Answer

C is correct answer..

https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

andrologin · Answer

C AWS GLue bookmarks are used to implement incremental processing

HunkyBunky · Answer

C - is right

Ja13 · Answer

C. Job bookmarks

Here's why job bookmarks are the appropriate feature:

Incremental Processing: Job bookmarks in AWS Glue help track the last processed state of data in Amazon S3. They enable the ETL job to resume from where it left off in case of interruptions or subsequent runs, ensuring that only new or modified data since the last successful run is processed (incremental processing).
Automated ETL: Job bookmarks work seamlessly within AWS Glue ETL jobs, allowing the job to efficiently manage the state of processed data without the need for manual intervention.
Support for Compressed Files: AWS Glue natively supports reading compressed files from Amazon S3, so the ingestion pipeline can handle compressed data formats efficiently.

DEA-C01 Exam - Question 117

Discussion