Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 117

Which of the following describes the difference between DataFrame.repartition(n) and DataFrame.coalesce(n)?

    Correct Answer: A

    DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly. DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions. The repartition() method involves a full shuffle to redistribute data evenly, while coalesce() is an optimized way to reduce the number of partitions without a full shuffle, which can result in uneven data distribution.

Discussion
thanabOption: A

A The correct answer is A. DataFrame.repartition(n) will split a DataFrame into n number of new partitions with data distributed evenly. DataFrame.coalesce(n) will more quickly combine the existing partitions of a DataFrame but might result in an uneven distribution of data across the new partitions. The `repartition()` method can be used to increase or decrease the number of partitions in a DataFrame, while the `coalesce()` method is used to only decrease the number of partitions in an efficient way². The `repartition()` method does a full shuffle and creates new partitions with data that's distributed evenly. On the other hand, `coalesce()` avoids a full shuffle by allowing only the reduction of partitions.

cookiemonster42Option: A

IMO it's A: B - repartition is less efficient because it involves shuffling - ->false C - same for the B reason --> false D - it's because of shuffling, not because of some column --> false E - coalesce if more fast --> false E -

gwq1968

A is correct

SaiPavan10Option: A

A is the right choice

siva1280Option: A

A is correct

saryuOption: A

It's A