Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 19


A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

Show Answer
Correct Answer: B

To compute the average humidity and temperature for each non-overlapping five-minute interval in a streaming data pipeline using a DataFrame, you need to group the data based on time windows. The window function in Structured Streaming is used to specify these kinds of time-based grouping. In this case, using window('event_time', '5 minutes') allows the data to be grouped into five-minute intervals based on the event_time column. This grouping is necessary to calculate the average values over these specific intervals.

Discussion

7 comments
Sign in to comment
thxsgodOption: B
Sep 7, 2023

Correct, B.

EertyyOption: B
Sep 21, 2023

answer is B

BIKRAM063Option: B
Nov 2, 2023

Window of 5 mins

sturcuOption: B
Oct 11, 2023

B is correct: https://www.databricks.com/blog/2017/05/08/event-time-aggregation-watermarking-apache-sparks-structured-streaming.html

kz_dataOption: B
Jan 10, 2024

B is correct

Jay_98_11Option: B
Jan 13, 2024

correct B

imatheushenriqueOption: B
Jun 1, 2024

B. window("event_time", "5 minutes").alias("time") In Structured Streaming, expressing such windows on event-time is simply performing a special grouping using the window() function. For example, counts over 5 minute tumbling (non-overlapping) windows on the eventTime column in the event is as following.