Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 118

Which of the following cluster configurations is most likely to experience delays due to garbage collection of a large Dataframe?

Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.

    Correct Answer: D

    Scenario #1 is most likely to experience delays due to garbage collection of a large Dataframe. In this scenario, there is a single executor with 100 GB of memory. Large memory space for a single executor can lead to longer garbage collection times, especially when handling large Dataframes. This is because garbage collection in large heaps can be more time-consuming and can pause the computation for significant periods, leading to noticeable delays. The other scenarios distribute the memory across multiple executors, which can help in parallelizing the garbage collection process and minimizing the delay.

Discussion
newusername

Please correct the question - answers alighment The scenarious do not match though I would say Scen 6 is the answer

deadbeef38Option: D

I think it is D- scenario 1 because the other scenarios can take advantage of parallelism.

Sowwy1Option: D

I think it's D - Scenario 1 Scenario #1 would most likely experience delays due to garbage collection because it has the largest heap space per executor, leading to longer garbage collection times when managing large DataFrames.

JuanitoFM

The answer is Scen 6 and than answer doesn´t appear, please align the answers