DEA-C01 Exam QuestionsBrowse all questions from this exam

DEA-C01 Exam - Question 108


A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job.

The data engineer has set the maximum concurrency for the AWS Glue job to 1.

The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs.

What is the likely reason the AWS Glue job is reprocessing the files?

Show Answer
Correct Answer: D

The likely reason the AWS Glue job is reprocessing the files is because the AWS Glue job does not have a required commit statement. Without the commit statement, AWS Glue cannot track which files were processed successfully in the last run, resulting in the reprocessing of the data from Amazon S3 in subsequent runs. This is necessary to ensure that the job bookmarks are working correctly and that previously processed files are not included in future runs.

Discussion

5 comments
Sign in to comment
Bmaster
Jun 29, 2024

D is good https://docs.aws.amazon.com/glue/latest/dg/glue-troubleshooting-errors.html#error-job-bookmarks-reprocess-data

loolOption: D
Jul 6, 2024

https://docs.aws.amazon.com/glue/latest/dg/glue-troubleshooting-errors.html#error-job-bookmarks-reprocess-data

AlagongOption: A
Jun 30, 2024

The commit statement (Option D) is not required for AWS Glue jobs. AWS Glue commits any open transactions to the database when all the script statements finish running.

HunkyBunky
Jul 3, 2024

I've not found any information that s3:GetObjectACL is necessary for Glue bookmarks, so I'm pretty sure that A is wrong

andrologin
Jul 18, 2024

It is the commit statement that ensures AWS saves the last successful processing

HunkyBunkyOption: D
Jul 2, 2024

For me - D looks correct

androloginOption: D
Jul 18, 2024

AWS Glue Job requires the commit statement to save the last successful run/processing