Which of the following cluster configurations is least likely to experience delays due to garbage collection of a large DataFrame?
Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.
Which of the following cluster configurations is least likely to experience delays due to garbage collection of a large DataFrame?
Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.
The cluster configuration that is least likely to experience delays due to garbage collection of a large DataFrame is Scenario #6. In this configuration, the workload is distributed across 8 worker nodes, each with a smaller memory and a moderate number of cores. Smaller executors (12.5 GB and 25 cores per executor) are generally more efficient in managing memory and can reduce the impact of garbage collection by allowing smaller, more frequent collections rather than large, time-consuming ones.
I think it's Scemario 6