Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 21

A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30 seconds. The streaming write is currently configured with a trigger interval of 10 seconds.

Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?

    Correct Answer: E

    To ensure that records are processed in less than 10 seconds, the key is to handle microbatch processing more efficiently during peak hours. Decreasing the trigger interval to 5 seconds can help achieve this by triggering batches more frequently, which may prevent records from backing up and large batches from causing spill. This allows more consistent batch processing times and utilizes available resources effectively, reducing the risk of exceeding the 10-second processing requirement.

Discussion
sturcuOption: E

Changing trigger interval to "one" will cause this to be a "batch" and will not execute in microbranches. This will not help at all

asmayassinegOption: E

correct answer is E. D means a job will need to acquire resources in 10s which is impossible without serverless

ojudz08Option: E

E is the answer. Enable the settings uses the 128 MB as the target file size https://learn.microsoft.com/en-us/azure/databricks/delta/tune-file-size

RafaelCFCOption: E

I believe this is a case of the least bad option, not exactly the best option possible. - A is wrong because in Streaming you very rarely have any executors idle, as all cores are engaged in processing the window of data; - B is wrong because triggering every 30s will not meet the 10s target processing interval; - C is wrong in two manners: increasing shuffle partitions to any number above the number of available cores in the cluster will worsen performance in streaming; also, the checkpoint folder has no connection with trigger time. - D is wrong because, keeping all other things the same as described by the problem, keeping the trigger time as 10s will not change the underlying conditions of the delay (i.e.: too much data to be processed in a timely manner). E is the only option that might improve processing time.

cotardo2077Option: E

for sure E

EertyyOption: E

correct anwer is E

imatheushenriqueOption: E

Considering the best option for performance gain is: E. Decrease the trigger interval to 5 seconds; triggering batches more frequently may prevent records from backing up and large batches from causing spill.

DAN_HOption: E

E is correct as A is wrong because in Streaming you very rarely have any executors idle

kz_dataOption: E

I think is E is correct

ervinshangOption: E

correct answer is E

ofedOption: C

Only C. Even if you trigger more frequently you decrease both load and time for this load. E doesn't change anything.

EertyyOption: E

correct answer is E

azurearchOption: C

sorry, the caveat is holding all other variables constant.. that means we are not allowed to change trigger intervals. is C the answer then

azurearchOption: D

what if in between those 5 seconds trigger interval if there are more records, that would still increase the time it takes to process.. i doubt E is correct. I will go with answer D. it is not to execute all queries within 10 secs. it is to execute trigger now batch every 10 seconds.

azurearchOption: E

A option also is about setting trigger interval to 5 seconds, just to understand.. why its not the answer