Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 19

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

    Correct Answer: B

    To compute the average humidity and temperature for each non-overlapping five-minute interval in a streaming data pipeline using a DataFrame, you need to group the data based on time windows. The window function in Structured Streaming is used to specify these kinds of time-based grouping. In this case, using window('event_time', '5 minutes') allows the data to be grouped into five-minute intervals based on the event_time column. This grouping is necessary to calculate the average values over these specific intervals.

Discussion
thxsgodOption: B

Correct, B.

BIKRAM063Option: B

Window of 5 mins

EertyyOption: B

answer is B

imatheushenriqueOption: B

B. window("event_time", "5 minutes").alias("time") In Structured Streaming, expressing such windows on event-time is simply performing a special grouping using the window() function. For example, counts over 5 minute tumbling (non-overlapping) windows on the eventTime column in the event is as following.

Jay_98_11Option: B

correct B

kz_dataOption: B

B is correct

sturcuOption: B

B is correct: https://www.databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html