Certified Data Engineer Associate Exam - Question 67

Question

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

Examice · Accepted Answer

The MERGE INTO command should be used when the target table cannot contain duplicate records. This command is specifically designed to handle both insertions and updates (or deletions) within one operation based on whether a match exists. It is particularly useful for scenarios where maintaining up-to-date data and ensuring data integrity, by avoiding duplicate records, is essential. On the other hand, the INSERT INTO command merely adds new records without consideration for existing records' potential duplication.

meow_akk · Answer

Ans D : With merge , you can avoid inserting the duplicate records. The dataset containing the new logs needs to be deduplicated within itself. By the SQL semantics of merge, it matches and deduplicates the new data with the existing data in the table, but if there is duplicate data within the new dataset, it is inserted.
https://docs.databricks.com/en/delta/merge.html#:~:text=With%20merge%20%2C%20you%20can%20avoid%20inserting%20the%20duplicate%20records.&text=The%20dataset%20containing%20the%20new,new%20dataset%2C%20it%20is%20inserted.

kz_data · Answer

D is correct

azure_bimonster · Answer

D is answer here

UGOTCOOKIES · Answer

MERGE INTO you can upsert (update insert) data from a source table, view or dataframe into the target table. Merge operation allows updates, insets and deletes to be completed in a single atomic transaction. The main benefit of using the MERGE INTO is to avoid duplicates but does not inherently remove duplicates.

fifirifi · Answer

correct answer: D
explanation: The MERGE INTO command is used when you need to perform both insertions and updates (or deletes) in one operation based on whether a match exists. It is particularly useful for maintaining up-to-date data and ensuring there are no duplicate records in the target table. This is often referred to as an "upsert" operation (update + insert). When the target table needs to be kept free of duplicate records, and there's a need to update existing records or insert new ones based on some matching condition, MERGE INTO is the appropriate command. The INSERT INTO command, on the other hand, is used to add new records to a table without regard for whether they duplicate existing records. Options A, B, C, and E do not specifically require the use of MERGE INTO. Therefore, D is the correct answer.

Certified Data Engineer Associate Exam - Question 67

Discussion