Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 26

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.

Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

    Correct Answer: C

    To maximize performance, it is crucial to balance the level of parallelism and the resources available to each Executor. Option C provides 16 VMs, with 25 GB of RAM and 10 cores per Executor. This configuration offers a good level of parallelism, which is beneficial for handling wide transformations that require significant data shuffling across multiple nodes. The allocation of 25 GB of RAM per Executor ensures that each Executor can handle its tasks efficiently without being resource-starved. The higher number of VMs compared to other options also improves fault tolerance and workload distribution, which are key to optimizing performance in distributed computing environments.

Discussion
robson90Option: A

Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl

dp_learner

source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html

SantitoxicOption: D

Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.

vikrampatel5Option: A

Option A: https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl

ofedOption: A

Option A

ismoshkovOption: A

Our goal is top performance. Vertical scaling is more performant rather that horizontal. Especially we know that we need cross VM exchange. Option A.

mwyopmeOption: C

Sorry Response C = 16VM for maximing Wide Transformation

stuart_gta1Option: C

C. More VMs helps to distribute the workload across the cluster, which results in better fault tolerance and increase the chances of job completion.

arik90Option: A

Wide transformation falls under complex etl which means Option A is correct in the documentation didn't mention to do otherwise in this scenario.

PrashantTiwariOption: A

A is correct

RafaelCFCOption: A

robson90's response explains it perfectly and has documentation to support it.

dp_learnerOption: A

response A. as of Complex batch ETL " More complex ETL jobs, such as processing that requires unions and joins across multiple tables, will probably work best when you can minimize the amount of data shuffled. Since reducing the number of workers in a cluster will help minimize shuffles, you should consider a smaller cluster like cluster A in the following diagram over a larger cluster like cluster D. "

dp_learner

source = source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html

mwyopmeOption: B

Key message is : Given a job with at least one wide transformation Performance, should max the number of concurrent VM, Selecting response B. 160/10 = 16 VM

taif12340Option: D

Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.

BrianNguyen95Option: E

correct answer is E: Option E provides a substantial amount of memory and cores per executor, allowing the job to handle wide transformations efficiently. However, performance can also be influenced by factors like the nature of your specific workload, data distribution, and overall cluster utilization. It's a good practice to conduct benchmarking and performance testing with various configurations to determine the optimal setup for your specific use case.

asmayassinegOption: E

answer should be E. if at least one transformation is wide, so 1 executor of 200GB can do the job, rest of tasks can be carried out on the other node

8605246

would it be fault-tolerant?