Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 125


The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.

The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.

The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.

Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?

Show Answer
Correct Answer: C

The default data retention threshold for Delta Lake is 7 days. This means that even after data is deleted, the files containing those records are retained for 7 days. Consequently, the deleted records, although no longer part of the active dataset, can still be accessed through time travel until the VACUUM command is executed to permanently remove the files. Since the VACUUM job runs every Monday at 3am, deleted data from the previous week will be accessible through time travel for approximately 8 days.

Discussion

2 comments
Sign in to comment
03355a2Option: A
Jun 27, 2024

Since the team is expecting last week's data to be deleted on Sunday at 1am to 2am. The data will be available for approx 24hrs until the vacuum command is run on Monday at 3am.

vexor3Option: C
Jul 21, 2024

C is correct