Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 118


Which of the following cluster configurations is most likely to experience delays due to garbage collection of a large Dataframe?

Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.

Show Answer
Correct Answer: D

Scenario #1 is most likely to experience delays due to garbage collection of a large Dataframe. In this scenario, there is a single executor with 100 GB of memory. Large memory space for a single executor can lead to longer garbage collection times, especially when handling large Dataframes. This is because garbage collection in large heaps can be more time-consuming and can pause the computation for significant periods, leading to noticeable delays. The other scenarios distribute the memory across multiple executors, which can help in parallelizing the garbage collection process and minimizing the delay.

Discussion

4 comments
Sign in to comment
newusername
Nov 9, 2023

Please correct the question - answers alighment The scenarious do not match though I would say Scen 6 is the answer

JuanitoFM
Feb 29, 2024

The answer is Scen 6 and than answer doesn´t appear, please align the answers

Sowwy1Option: D
Apr 9, 2024

I think it's D - Scenario 1 Scenario #1 would most likely experience delays due to garbage collection because it has the largest heap space per executor, leading to longer garbage collection times when managing large DataFrames.

deadbeef38Option: D
Jun 23, 2024

I think it is D- scenario 1 because the other scenarios can take advantage of parallelism.