Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 92


In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both DEEP and SHALLOW CLONE, development tables are created using SHALLOW CLONE.

A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that VACUUM was run the day before.

Which statement describes why the cloned tables are no longer working?

Show Answer
Correct Answer: D

Shallow clones in Delta Lake only duplicate the metadata of the table being cloned, without copying the actual data files. These clones reference the source table's data files. When the VACUUM command is executed, it removes data files that are no longer referenced by the source table's transaction log. However, the shallow clones still reference these purged data files. As a result, the metadata created by the CLONE operation in the shallow clones is referencing data files that were purged by the VACUUM command, causing the cloned tables to stop working.

Discussion

5 comments
Sign in to comment
alexvnoOption: D
Dec 19, 2023

Shallow clone: only duplicates the metadata of the table being cloned; the data files of the table itself are not copied. These clones are cheaper to create but are not self-contained and depend on the source from which they were cloned as the source of data. If the files in the source that the clone depends on are removed, for example with VACUUM, a shallow clone may become unusable. Therefore, shallow clones are typically used for short-lived use cases such as testing and experimentation.

AzureDE2522Option: D
Nov 20, 2023

Please refer: https://docs.databricks.com/en/delta/clone.html#what-are-the-semantics-of-delta-clone-operations

60tiesOption: B
Nov 15, 2023

B is best

spaceexplorerOption: D
Jan 25, 2024

D is correct

vctrhugoOption: D
Feb 6, 2024

In Delta Lake, the VACUUM command deletes data files that are no longer referenced by a Delta table and are older than the retention threshold. When a table is cloned using SHALLOW CLONE, the clone references the same data files as the original table but creates a new transaction log. If VACUUM is run on the original table, it can delete data files that are still being referenced by the cloned table’s metadata, causing the cloned table to stop working. This is because the VACUUM command doesn’t know about the cloned table’s references to the data files. Therefore, it’s important to be cautious when running VACUUM on tables that have clones.