Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 254


You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled. You have set the maximum number of workers to 1000. The input of your pipeline is Pub/Sub messages with notifications from Cloud Storage. One of the pipeline transforms reads CSV files and emits an element for every CSV line. The job performance is low, the pipeline is using only 10 workers, and you notice that the autoscaler is not spinning up additional workers. What should you do to improve performance?

Show Answer
Correct Answer: B

Fusion optimization in Dataflow can result in multiple transformations being fused into a single stage, which limits parallelism and hinders performance, especially in streaming pipelines. Introducing a Reshuffle step breaks this fusion, allowing better distribution of work across available workers. This increased parallelism can trigger the autoscaler to utilize more workers, thereby improving overall job performance.

Discussion

6 comments
Sign in to comment
raaadOption: B
Jan 5, 2024

- Fusion optimization in Dataflow can lead to steps being "fused" together, which can sometimes hinder parallelization. - Introducing a Reshuffle step can prevent fusion and force the distribution of work across more workers. - This can be an effective way to improve parallelism and potentially trigger the autoscaler to increase the number of workers.

GCP001Option: B
Jan 8, 2024

Problem is performnace and not using all workers properly, https://cloud.google.com/dataflow/docs/pipeline-lifecycle#fusion_optimization

ML6Option: B
Feb 17, 2024

Fusion occurs when multiple transformations are fused into a single stage, which can limit parallelism and hinder performance, especially in streaming pipelines. By introducing a Reshuffle step, you break fusion and allow for better parallelism.

srivastavas08
Feb 10, 2024

https://cloud.google.com/dataflow/docs/guides/right-fitting

scaenruyOption: D
Jan 3, 2024

D. Use Dataflow Prime, and enable Right Fitting to increase the worker resources.

LestrangOption: C
Jun 8, 2024

Right fitting is for declaration, declaring the correct resources will not help. Reshuffling step is what can prevent fusion which can lead to unused workers.