Certified Associate Developer for Apache Spark Exam - Question 48

Question

Which of the following Spark properties is used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitions during a shuffle?

Examice · Accepted Answer

The Spark property 'spark.sql.adaptive.coalescePartitions.enabled' is used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitions during a shuffle. When this property is set to true, Spark condenses smaller partitions into larger ones to optimize the shuffle process.

4be8126 · Answer

The answer is E. spark.sql.adaptive.coalescePartitions.enabled is the Spark property used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitions during a shuffle. When set to true, Spark automatically coalesces partitions that are smaller than the configured minimum size into larger partitions to optimize shuffles.

juliom6 · Answer

https://spark.apache.org/docs/latest/sql-performance-tuning.html

spark.sql.adaptive.coalescePartitions.enabled: When true and spark.sql.adaptive.enabled is true, Spark will coalesce contiguous shuffle partitions according to the target size (specified by spark.sql.adaptive.advisoryPartitionSizeInBytes), to avoid too many small tasks.

Certified Associate Developer for Apache Spark Exam - Question 48

Discussion