Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 15


A table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources.

The churn prediction model used by the ML team is fairly stable in production. The team is only interested in making predictions on records that have changed in the past 24 hours.

Which approach would simplify the identification of these changed records?

Show Answer
Correct Answer: E

To simplify the identification of changed records in the Lakehouse table customer_churn_params, it is most effective to replace the current overwrite logic with a merge statement. This way, only the records that have changed will be modified. The change data feed can then identify which records have been altered, allowing the machine learning team to make predictions only on those records. This approach minimizes unnecessary computations and processes only the relevant data, ensuring efficiency.

Discussion

5 comments
Sign in to comment
EertyyOption: E
Aug 27, 2023

E is right answer

sturcuOption: E
Oct 16, 2023

E is Correct

leopedroso1Option: E
Feb 18, 2024

E is the correct one. By removing overwrite with merge, this will lead to an UPSERT causing updating only the data needed ( When Matched Upate + When not mached insert clauses). Then, with the CDC the capability of identifying is also satisfied.

kz_dataOption: E
Jan 10, 2024

E is correct

AziLaOption: E
Jan 21, 2024

correct ans is E