Which of the following cluster configurations is most likely to experience delays due to garbage collection of a large Dataframe?
Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.
Which of the following cluster configurations is most likely to experience delays due to garbage collection of a large Dataframe?
Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.
Scenario #1 is most likely to experience delays due to garbage collection of a large Dataframe. In this scenario, there is a single executor with 100 GB of memory. Large memory space for a single executor can lead to longer garbage collection times, especially when handling large Dataframes. This is because garbage collection in large heaps can be more time-consuming and can pause the computation for significant periods, leading to noticeable delays. The other scenarios distribute the memory across multiple executors, which can help in parallelizing the garbage collection process and minimizing the delay.
Please correct the question - answers alighment The scenarious do not match though I would say Scen 6 is the answer
I think it is D- scenario 1 because the other scenarios can take advantage of parallelism.
I think it's D - Scenario 1 Scenario #1 would most likely experience delays due to garbage collection because it has the largest heap space per executor, leading to longer garbage collection times when managing large DataFrames.
The answer is Scen 6 and than answer doesn´t appear, please align the answers