Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 47


The code block shown below contains an error. The code block is intended to return a new 12-partition DataFrame from the 8-partition DataFrame storesDF by inducing a shuffle. Identify the error.

Code block:

storesDF.coalesce(12)

Show Answer
Correct Answer: B

The coalesce() operation does not induce a shuffle and cannot increase the number of partitions. It is designed to decrease the number of partitions in a more efficient manner without causing a full shuffle. To increase the number of partitions, the repartition() operation should be used instead as it involves shuffling the data.

Discussion

2 comments
Sign in to comment
4be8126Option: B
May 3, 2023

The correct answer is B. The coalesce() operation can decrease the number of partitions but cannot increase the number of partitions. It also does not induce a shuffle, and is therefore more efficient when decreasing the number of partitions. If the goal is to increase the number of partitions, repartition() should be used instead.

Raju_BhaiOption: B
Oct 11, 2023

with version 3.4.0, df.repartition(12).coalesce(16).rdd.getNumPartitions() returns 12. it doesn't throw error, but only doesn't increase partition either