Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 97

A data architect has heard about Delta Lake’s built-in versioning and time travel capabilities. For auditing purposes, they have a requirement to maintain a full record of all valid street addresses as they appear in the customers table.

The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.

Which piece of information is critical to this decision?

    Correct Answer: D

    Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution. This is because each write operation generates a new version of the data, leading to substantial storage consumption over time. Additionally, querying older versions involves scanning numerous files, resulting in increased query latency. Therefore, for long-term versioning, Type 2 tables, which track changes using separate records instead of versioning, might be a more suitable and efficient choice.

Discussion
60tiesOption: D

D makes more sense

vctrhugoOption: D

Delta Lake’s time travel feature allows you to access previous versions of the data, which can be useful for auditing purposes. However, if you’re planning to use time travel as a long-term versioning solution, it’s important to know that it may not scale well in terms of cost or latency. This is because every time you perform a write operation, a new version of the data is created, which can consume significant storage over time. Additionally, querying older versions of the data may require scanning through many files, which can increase query latency.