Certified Data Engineer Professional Exam - Question 26

Question

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.

Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

Examice · Accepted Answer

To maximize performance, it is crucial to balance the level of parallelism and the resources available to each Executor. Option C provides 16 VMs, with 25 GB of RAM and 10 cores per Executor. This configuration offers a good level of parallelism, which is beneficial for handling wide transformations that require significant data shuffling across multiple nodes. The allocation of 25 GB of RAM per Executor ensures that each Executor can handle its tasks efficiently without being resource-starved. The higher number of VMs compared to other options also improves fault tolerance and workload distribution, which are key to optimizing performance in distributed computing environments.

robson90 · Answer

Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl

Santitoxic · Answer

Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.

stuart_gta1 · Answer

C. More VMs helps to distribute the workload across the cluster, which results in better fault tolerance and increase the chances of job completion.

mwyopme · Answer

Sorry Response C = 16VM for maximing Wide Transformation

ismoshkov · Answer

Our goal is top performance.
Vertical scaling is more performant rather that horizontal. Especially we know that we need cross VM exchange. Option A.

ofed · Answer

Option A

vikrampatel5 · Answer

Option A:
https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl

asmayassineg · Answer

answer should be E. if at least one transformation is wide, so 1 executor of 200GB can do the job, rest of tasks can be carried out on the other node

BrianNguyen95 · Answer

correct answer is E: Option E provides a substantial amount of memory and cores per executor, allowing the job to handle wide transformations efficiently. However, performance can also be influenced by factors like the nature of your specific workload, data distribution, and overall cluster utilization. It's a good practice to conduct benchmarking and performance testing with various configurations to determine the optimal setup for your specific use case.

taif12340 · Answer

Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.

mwyopme · Answer

Key message is : Given a job with at least one wide transformation
Performance, should max the number of concurrent VM, Selecting response B. 160/10 = 16 VM

dp_learner · Answer

response A. 
as of 
Complex batch ETL

" More complex ETL jobs, such as processing that requires unions and joins across multiple tables, will probably work best when you can minimize the amount of data shuffled. Since reducing the number of workers in a cluster will help minimize shuffles, you should consider a smaller cluster like cluster A in the following diagram over a larger cluster like cluster D. "

RafaelCFC · Answer

robson90's response explains it perfectly and has documentation to support it.

PrashantTiwari · Answer

A is correct

arik90 · Answer

Wide transformation falls under complex etl which means Option A is correct in the documentation didn't mention to do otherwise in this scenario.

Certified Data Engineer Professional Exam - Question 26

Discussion