Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 11


The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.

The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.

The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.

Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?

Show Answer
Correct Answer: AE

Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the VACUUM job is run 8 days later. Delta Lake retains a 7-day history to support operations like time travel. As a result, even after deletion, the data remains accessible via time travel until the retention period expires and a VACUUM operation is performed to permanently remove the records.

Discussion

18 comments
Sign in to comment
asmayassinegOption: E
Aug 2, 2023

Answer is E, default retention period is 7 days https://learn.microsoft.com/en-us/azure/databricks/delta/vacuum

EertyyOption: E
Aug 27, 2023

e is right answer

mardigrasOption: A
Feb 27, 2024

The answer has to be A. The deletion is done on Sunday 1am and then the next day Monday 3am, VACUUM was initiated, so one can only time travel for about 24 hours.

aragorn_bregoOption: E
Nov 21, 2023

Delta Lake's time travel feature allows you to query an older snapshot of a table. By default, Delta Lake retains a 7-day history for the table to support operations like time travel. When data is deleted from a Delta table, the actual data files are not immediately removed from the storage layer; they are just marked for deletion. The VACUUM command is used to clean up these files that are no longer in the state of the table, but it will not remove any files that fall within the retention period unless it is run with an override option to reduce the retention period. Thus, if the deletions are processed on Sunday and the VACUUM command is run on Monday without overriding the default retention period, the deleted records would still be accessible via time travel for approximately 8 days (until the next run of the VACUUM command after the data has aged past the 7-day retention period).

BIKRAM063Option: E
Nov 2, 2023

Answer is E

juliom6Option: A
Apr 8, 2024

Si bien la data es borrada (DELETE) el domingo, aún se puede recuperar ella mediante time traveling, sólo el día siguiente (lunes) se eliminará esta posibilidad debido a que se ejecuta el VACUUM, en consecuencia la data se podrá recuperar en ese lapso de 24 horas aprox

hamzaKhribiOption: E
Dec 2, 2023

Correct answer is E, In this question tables are with default settings and giving delta retention is 7 days the data will still be accessible for the last 7 days.

RafaelCFCOption: E
Jan 9, 2024

Correct according to the documentation: https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html

kz_dataOption: E
Jan 10, 2024

Answer is E as the default retention period is 7 days

kz_dataOption: E
Jan 10, 2024

Answer is E

spaceexplorerOption: E
Jan 24, 2024

Answer is E

RiktRikt007Option: E
Feb 10, 2024

if i v0: create table, v1: insert 2 reocrds, v2: insert 2 record, v3: delete 2 records, and then run the vacuum command (with default 7 day retention), the delete records will be there and you can access using SELECT * FROM delta_table VERSION AS OF 2;

hedbergareOption: E
Apr 9, 2024

Answer is E

TayariOption: E
Apr 30, 2024

The default retention threshold for data files after running VACUUM is 7 days.

coercionOption: E
May 19, 2024

Default retention period is 7 days so newly deleted data on Sunday will be available for next 7 days (even if vacuum was run on Monday as it will delete 7 days old data and not the data that was loaded yesterday "Sunday" )

imatheushenriqueOption: E
Jun 5, 2024

E. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the VACUUM job is run 8 days later.

03355a2Option: A
Jun 27, 2024

They expect the deleted records for the previous week to be deleted Sunday from 1am to 2am. Then the next day(Monday) at 3am approx 24hrs later, the vacuum command is ran. This means the records from the previous week are only around for 24ish hours before they are removed with the vacuum command. They aren't waiting 8 days to run the command, there fore E is wrong.

Michael Mesfin
Apr 12, 2025

answer should be A. The delete job processes deletions from the previous week, meaning the data being deleted is already at least 7 days old. By default, Delta Lake retains data files for 7 days (delta.deletedFileRetentionDuration). When the delete job runs on Sunday at 1am, the files associated with the deleted data are already past the 7-day retention threshold. The subsequent VACUUM job on Monday at 3am removes these files, as they are now eligible for cleanup.