Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 75

A data engineer is configuring a pipeline that will potentially see late-arriving, duplicate records.

In addition to de-duplicating records within the batch, which of the following approaches allows the data engineer to deduplicate data against previously processed records as it is inserted into a Delta table?

    Correct Answer: C

    To efficiently deduplicate data against previously processed records as it is inserted into a Delta table, performing an insert-only merge with a matching condition on a unique key is the appropriate approach. This technique allows the data engineer to perform upsert operations, meaning that if an incoming record matches an existing record based on the unique key, the existing record can be updated or ignored to handle duplicates. If there is no match, the new record will be inserted. This ensures that duplicates are managed both within the current batch and against previously processed records.

Discussion
sturcuOption: C

Merge, when not match insert

aragorn_bregoOption: C

To handle deduplication against previously processed records in a Delta table, the MERGE INTO command can be used to perform an upsert operation. This means that if the incoming data has a record that matches an existing record based on a unique key, the MERGE INTO operation can update the existing record (if needed) or simply ignore the duplicate. If there is no match (i.e., the record is new), then the record will be inserted

DileepvikramOption: C

Answer is C

hm358Option: C

merge will be more efficient

60tiesOption: C

answer is C

CrocjunOption: C

C Reference: file:///C:/Users/yuen1/Downloads/databricks-certified-data-engineer-professional-exam-guide.pdf

mouad_attaqi

you are referencing a local pdf in your computer !!!