Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 50


A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.

When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

Show Answer
Correct Answer: DE

A bottleneck caused by code executing on the driver would be indicated by overall cluster CPU utilization being around 25%. This suggests that the driver node is likely overburdened, as it is consuming most of its CPU resources while the executor nodes are underutilized. In a properly balanced cluster, the CPU utilization should be spread more evenly across the nodes. If the driver is the bottleneck, it would prevent the executors from being effectively utilized, hence the low overall CPU utilization.

Discussion

8 comments
Sign in to comment
BrianNguyen95Option: E
Aug 27, 2023

Option E: In a Spark cluster, the driver node is responsible for managing the execution of the Spark application, including scheduling tasks, managing the execution plan, and interacting with the cluster manager. If the overall cluster CPU utilization is low (e.g., around 25%), it may indicate that the driver node is not utilizing the available resources effectively and might be a bottleneck.

guillesd
Feb 7, 2024

Overall CPU utilization can be misleading. The 25% utilization could be caused by the workload not requiring more than that rather than everything being executed in the driver node.

sturcuOption: D
Oct 16, 2023

If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized

sturcu
Oct 24, 2023

Correct Answer is E.

sturcuOption: E
Oct 24, 2023

If the overall cluster CPU utilization is around 25%, it means that only one out of the four nodes (driver + 3 executors) is using its full CPU capacity, while the other three nodes are idle or underutilized

azurelearn2020Option: E
Dec 9, 2023

25% indicates Cluster CPU under-utilized

Def21
Jan 24, 2024

Not correct. 25% could (in theory) mean driver is using 100% CPU

PatitoOption: D
Dec 29, 2023

D seems to be right

rok21Option: E
Dec 9, 2023

E is correct

guillesdOption: D
Feb 7, 2024

If there's no IO between driver and executor nodes then the executor nodes are not working

lophonosOption: E
Jun 10, 2024

E is correct