Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 62

The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts transforms, and loads the data for their pipeline runs in 10 minutes.

Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

    Correct Answer: B

    The best option to meet the service-level agreement requirements with the lowest cost is to schedule a job to execute the pipeline once an hour on a new job cluster. This approach ensures that the data is updated every hour, meeting the requirement. Additionally, using a job cluster that is started and stopped for each run is more cost-effective than keeping a cluster running continuously, as it only incurs compute costs for the actual processing time, which is 10 minutes per hour.

Discussion
divingbell17Option: B

B is correct I think. With option C, the cluster remains on 24/7 with trigger = 60 mins which is more costly If there is an option with structure streaming with trigger = availablenow, and job scheduled per hour, that would be even more efficient. https://www.databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html

Curious76Option: C

Databricks recommends using Structured Streaming with trigger AvailableNow for incremental workloads that do not have low latency requirements.

spaceexplorerOption: B

B is correct

alexvnoOption: B

B : Job cluster is cheap , hourly = 60 minutes

aragorn_bregoOption: B

Scheduling a job to execute the pipeline on an hourly basis aligns with the requirement for data to be updated every hour. Using a job cluster (which is brought up for the job and torn down upon completion) rather than a dedicated interactive cluster will usually be more cost-effective. This is because you are only paying for the compute resources when the job is running, which is 10 minutes out of every hour, rather than paying for an interactive cluster that would be up and running (and incurring costs) continuously.

ofedOption: B

It's either B or D. I think B, because we want the lowest cost.