The code block shown below contains an error. The code block is intended to return a new 12-partition DataFrame from the 8-partition DataFrame storesDF by inducing a shuffle. Identify the error.
Code block:
storesDF.coalesce(12)
The code block shown below contains an error. The code block is intended to return a new 12-partition DataFrame from the 8-partition DataFrame storesDF by inducing a shuffle. Identify the error.
Code block:
storesDF.coalesce(12)
The coalesce() operation does not induce a shuffle and cannot increase the number of partitions. It is designed to decrease the number of partitions in a more efficient manner without causing a full shuffle. To increase the number of partitions, the repartition() operation should be used instead as it involves shuffling the data.
The correct answer is B. The coalesce() operation can decrease the number of partitions but cannot increase the number of partitions. It also does not induce a shuffle, and is therefore more efficient when decreasing the number of partitions. If the goal is to increase the number of partitions, repartition() should be used instead.
with version 3.4.0, df.repartition(12).coalesce(16).rdd.getNumPartitions() returns 12. it doesn't throw error, but only doesn't increase partition either