Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 20

A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.

The proposed directory structure is displayed below:

Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

    Correct Answer: E

    In a scenario where multiple Structured Streaming jobs are concurrently writing to a Delta table, each stream needs to maintain its own checkpoint directory. Checkpoints are used to store the streaming state and metadata, ensuring data consistency and recovery in case of failures. Sharing a single checkpoint directory between multiple streams can lead to data corruption and inconsistencies because the checkpointing mechanism is not designed to handle concurrent access by multiple streams. Therefore, the checkpoint directory structure depicted is not valid, and each stream should have its own checkpoint directory.

Discussion
thxsgodOption: E

Correct, E. Source: https://docs.databricks.com/en/optimizations/isolation-level.html#:~:text=If%20a%20streaming%20query%20using%20the%20same%20checkpoint%20location%20is%20started%20multiple%20times%20concurrently%20and%20tries%20to%20write%20to%20the%20Delta%20table%20at%20the%20same%20time.%20You%20should%20never%20have%20two%20streaming%20queries%20use%20the%20same%20checkpoint%20location%20and%20run%20at%20the%20same%20time.

Eertyy

answer is correct

sturcuOption: E

E is correct. If user wants 1 checkpoint directory then he needs to unions streams before writing.

imatheushenriqueOption: E

E. No; each of the streams needs to have its own checkpoint directory. The checkpoint directory is 1 to 1

svikOption: B

It is not clear from the question that year_week=2020_01 and year_week=2020_02 are used by stream 1 and stream 2 respectively. If they use the common parent checkpoint directory with individual sub folders for checkpointing, that should work fine. In that case the answer should be B

Kill9

That are table partitions. They are not used to build checkpoint adress. The adress finish at /bronze

Jay_98_11Option: E

correct E

kz_dataOption: E

E is correct