Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 69

The business intelligence team has a dashboard configured to track various summary metrics for retail stores. This includes total sales for the previous day alongside totals and averages for a variety of time periods. The fields required to populate this dashboard have the following schema:

For demand forecasting, the Lakehouse contains a validated table of all itemized sales updated incrementally in near real-time. This table, named products_per_order, includes the following fields:

Because reporting on long-term sales trends is less volatile, analysts using the new dashboard only require data to be refreshed once daily. Because the dashboard will be queried interactively by many users throughout a normal business day, it should return results quickly and reduce total compute associated with each materialization.

Which solution meets the expectations of the end users while controlling and limiting possible costs?

    Correct Answer: A

    The best solution is to configure a nightly batch job to save the required values as a table overwritten with each update. This approach meets the requirement for daily refreshes and ensures that the data is precomputed and stored in a format that allows for quick querying. Since the dashboard only needs to be refreshed once daily, a batch job is more cost-effective and efficient compared to real-time streaming or querying live data, which would require more compute resources and incur higher costs. Additionally, this method avoids the need for constant computation, thus reducing the overall system load during business hours when the dashboard is accessed by many users.

Discussion
dmovOption: A

looks like A to me, as long as they only need the data for the aggregates based on the previous day only

Def21

E - a view, could be an option but it would require computation every time used.