Exam DP-203 All QuestionsBrowse all questions from this exam
Question 182

You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:

The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.

✑ Line total sales amount and line total tax amount will be aggregated in Databricks.

✑ Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.

You need to recommend an output mode for the dataset that will be processed by using Structured Streaming. The solution must minimize duplicate data.

What should you recommend?

    Correct Answer: C

    The most appropriate mode for this scenario is Append. Since the sales transactions will never be updated and new rows are added to adjust a sale, Append mode is suitable because it only adds new rows to the output sink without modifying existing rows. This prevents duplication and ensures that only the new data is appended, thereby maintaining the integrity of the aggregated data. Using Update or Complete modes would either complicate the process or introduce inefficiencies, as these modes involve modifications to the existing data.

Discussion
necktruOption: A

I think Update is correct, because " new rows will be added to adjust a sale" , that means that in the course of a day you must update de daily import with the new sales, the group by process generates new amounts, keep in mind that when it say "sales transactions will never be updated" its about the online store, not the aggregated rows.

vctrhugo

Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.

[Removed]Option: C

Using chatgpt : Append

Azure_2023Option: C

Append: This mode adds new rows to the output sink for each received event. It's perfect for your scenario where sales transactions are never updated but adjusted by adding new rows. This guarantees no duplicate data due to updates, minimizing duplicates. Update: This mode updates existing rows in the output sink if a matching key is found. Since you mentioned no updates occur, using this mode would lead to unnecessary operations and potential inconsistencies. Complete: This mode writes the entire dataset to the output sink for each micro-batch interval. This is unnecessary and inefficient for your scenario since only new rows need to be added, and it potentially duplicates data across micro-batches.

DusicaOption: C

Pay attention to the "adjust" word. That is like in double entry accounting. Lines are added positive or negative, then queries are used to produce final numeric value (aggregating). It is C

jpgsa11

Exacly

ExamKiller42Option: A

I think 'A. Update' is correct. From what I understand, "Sales transactions will never be updated. Instead, new rows will be added to adjust a sale." means that the input stream will have new rows reflecting the corrected sales transaction. If we use "append" output mode we will have duplicates in the target table, corresponding to both the original transaction as well as the new corrected transaction. Instead, we can use the forEachBatch method and "update" output mode to merge each microBatch to the target table, updating old transactions if they match or inserting new ones if they don't. This would minimize duplicate data as well as allow for line sales amount and tax amount to be aggregated correctly in the target table.

ExamKiller42

https://docs.databricks.com/en/structured-streaming/delta-lake.html#upsert-from-streaming-queries-using-foreachbatch

DanweoOption: C

We need to append new entries only, the question describes updates that are not done on existing row entries, but by adding new rows of duplicate transactions

ageorgievaOption: C

append...

AlongiOption: A

It's Update

j888Option: C

Append. The update does not add the new rows

jsav1Option: C

Append: you are only adding new rows and existing rows do not need to be updated

dakku987Option: C

when you see new rows will be added to APPEND is always the answers

d046bc0Option: A

(ChatGPT) The Append output mode is used when new rows are added to the result table. This mode is suitable for scenarios where the output table is a summary of the input data, and the input data is not updated

dawoodieeOption: C

Sales transactions will NEVER be updated. Append.

Ram9198Option: A

Update

EliteAllenOption: C

C. Append This mode is used when you are always adding new records to the output data. Given that sales transactions will never be updated and new rows will be added to adjust a sale, this mode seems to be the most suitable. It will also help in minimizing duplicate data since it only adds new records and does not modify existing ones.

kkk5566Option: A

A is correct

kkk5566

ignore it

TightbotOption: C

Append is the right choice . Update is for modifications and append is to add new rows