Professional Data Engineer Exam - Question 172

Question

You are analyzing the price of a company's stock. Every 5 seconds, you need to compute a moving average of the past 30 seconds' worth of data. You are reading data from Pub/Sub and using DataFlow to conduct the analysis. How should you set up your windowed pipeline?

Examice · Accepted Answer

To compute a moving average of the past 30 seconds' worth of data every 5 seconds, you should use a sliding window with a duration of 30 seconds and a period of 5 seconds. This configuration allows you to recalculate the moving average continuously every 5 seconds, taking into account the data from the last 30 seconds. The trigger setting 'AfterWatermark.pastEndOfWindow()' ensures that the results are accurate and timely based on the event time, considering all relevant data within each window before emitting the results.

AWSandeep · Answer

D. Use a sliding window with a duration of 30 seconds and a period of 5 seconds. Emit results by setting the following trigger: AfterWatermark.pastEndOfWindow ()
Reveal Solution

vamgcp · Answer

Option D: Sliding Window: Since you need to compute a moving average of the past 30 seconds' worth of data every 5 seconds, a sliding window is appropriate. A sliding window allows overlapping intervals and is well-suited for computing rolling aggregates.

Window Duration: The window duration should be set to 30 seconds to cover the required 30 seconds' worth of data for the moving average calculation.

Window Period: The window period or sliding interval should be set to 5 seconds to move the window every 5 seconds and recalculate the moving average with the latest data.

Trigger: The trigger should be set to AfterWatermark.pastEndOfWindow() to emit the computed moving average results when the watermark advances past the end of the window. This ensures that all data within the window is considered before emitting the result.

pluiedust · Answer

Moving average ——>  sliding window

zellck · Answer

D is the answer.

https://cloud.google.com/dataflow/docs/concepts/streaming-pipelines#hopping-windows
You set the following windows with the Apache Beam SDK or Dataflow SQL streaming extensions:
Hopping windows (called sliding windows in Apache Beam)

A hopping window represents a consistent time interval in the data stream. Hopping windows can overlap, whereas tumbling windows are disjoint.

For example, a hopping window can start every thirty seconds and capture one minute of data. The frequency with which hopping windows begin is called the period. This example has a one-minute window and thirty-second period.

Kimich · Answer

AfterWatermark is an essential triggering condition in Dataflow that allows computations to be triggered based on event time rather than processing time. Then eliminate A&C. Comparing B&D, B will generate outcome every 30 seconds which is not what we want

D. Using a sliding window with a duration of 30 seconds and a period of 5 seconds, and setting the trigger as AfterWatermark.pastEndOfWindow(), is a sliding window that generates results every 5 seconds, and each result includes data from the past 30 seconds. In other words, every 5 seconds, you get the average value of the most recent 30 seconds' data, and there is a 5-second overlap between these windows. This is what we want.

Anudeep58 · Answer

Option D is the correct configuration because it uses a sliding window of 30 seconds with a period of 5 seconds, ensuring that the moving average is computed every 5 seconds based on the past 30 seconds of data. The trigger AfterWatermark.pastEndOfWindow() ensures timely and accurate results are emitted as the watermark progresses.

Professional Data Engineer Exam - Question 172

Discussion