Certified Data Engineer Professional Exam - Question 134

Question

A Structured Streaming job deployed to production has been resulting in higher than expected cloud storage costs. At present, during normal execution, each microbatch of data is processed in less than 3s; at least 12 times per minute, a microbatch is processed that contains 0 records. The streaming write was configured using the default trigger settings. The production job is currently scheduled alongside many other Databricks jobs in a workspace with instance pools provisioned to reduce start-up time for jobs with batch execution.

Holding all other variables constant and assuming records need to be processed in less than 10 minutes, which adjustment will meet the requirement?

Examice · Accepted Answer

Setting the trigger interval to 10 minutes will ensure that the microbatches are processed at a frequency that minimizes the number of times the source storage account APIs are called, thereby reducing the associated costs. This adjustment will meet the requirement of processing records in less than 10 minutes and reduce the occurrences of processing microbatches with zero records.

hpkr · Answer

Option C

Isio05 · Answer

C, 
A - incorrect explanation
B - trigger once is not correct option here
D - 500 miliseconds is already used, it's default trigger interval

Certified Data Engineer Professional Exam - Question 134

Discussion