Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 85


Which of the following code blocks will always return a new 4-partition DataFrame from the 8-partition DataFrame storesDF without inducing a shuffle?

Show Answer
Correct Answer: C

Coalesce is used to reduce the number of partitions in an existing DataFrame without inducing a shuffle. By calling storesDF.coalesce(4), it ensures that the number of partitions is reduced from 8 to 4 in the most efficient way possible without shuffling the data among the partitions, making it the correct choice.

Discussion

1 comment
Sign in to comment
azure_bimonsterOption: C
Feb 9, 2024

C is the right one here. Unlike repartition(), coalesce() reduces the number of partitions without shuffling the data. By specifying the number of partitions (4), it ensures that the resulting DataFrame has 4 partitions without inducing a shuffle.