Which of the following code blocks will always return a new 4-partition DataFrame from the 8-partition DataFrame storesDF without inducing a shuffle?
Which of the following code blocks will always return a new 4-partition DataFrame from the 8-partition DataFrame storesDF without inducing a shuffle?
Coalesce is used to reduce the number of partitions in an existing DataFrame without inducing a shuffle. By calling storesDF.coalesce(4), it ensures that the number of partitions is reduced from 8 to 4 in the most efficient way possible without shuffling the data among the partitions, making it the correct choice.
C is the right one here. Unlike repartition(), coalesce() reduces the number of partitions without shuffling the data. By specifying the number of partitions (4), it ensures that the resulting DataFrame has 4 partitions without inducing a shuffle.