Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 140

A nightly job ingests data into a Delta Lake table using the following code:

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.

Which code snippet completes this function definition?

def new_records():

    Correct Answer: B

    To effectively identify and manipulate new records in a Delta Lake table, you should use the Change Data Feed (CDF) feature. This will allow the function to read only the changes (inserts and updates) made to the 'bronze' table since the last processing. This is accomplished by using the read option 'readChangeFeed' set to 'true'. This method is efficient for processing new records that have not yet been moved to the next table in the pipeline, meeting the goal of the function defined in the question.

Discussion
MDWPartnersOption: D

Seems D

FreyrOption: B

Correct Answer: B The Change Data Feed (CDF) feature in Delta Lake enables reading only the changes (inserts and updates) to a Delta table. This would allow the function to focus on new or modified data since the last trigger, making it ideal for processing only the new records that have not been processed yet. This directly meets the requirement for identifying and manipulating new records efficiently.