Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 114


A data team’s Structured Streaming job is configured to calculate running aggregates for item sales to update a downstream marketing dashboard. The marketing team has introduced a new promotion, and they would like to add a new field to track the number of times this promotion code is used for each item. A junior data engineer suggests updating the existing query as follows. Note that proposed changes are in bold.

Original query:

Proposed query:

Which step must also be completed to put the proposed query into production?

Show Answer
Correct Answer: A

To ensure a smooth transition when updating the schema of a streaming job, it is crucial to specify a new checkpoint location. This precaution ensures that the streaming query starts afresh with the updated schema, thus preventing any potential conflicts or issues arising from mismatches between the old and new schemas. This step is particularly important when introducing new fields, as existing state data might not be compatible with the new schema.

Discussion

2 comments
Sign in to comment
MDWPartnersOption: A
May 29, 2024

This checkpoint location preserves all of the essential information that identifies a query. Each query must have a different checkpoint location. Multiple queries should never have the same location. For more information, see the Structured Streaming Programming Guide. https://docs.databricks.com/en/structured-streaming/query-recovery.html

Deb9753Option: A
Jun 5, 2024

Answer: A When updating the schema of a streaming job, specifying a new checkpoint location ensures that the streaming query starts fresh with the new schema. This avoids issues that might arise from schema mismatches between the previous state and the new schema. This is especially relevant when adding new fields because the existing state might not be compatible with the new schema.