Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 39


Which of the following code blocks applies the function assessPerformance() to each row of DataFrame storesDF?

Show Answer
Correct Answer: D

To apply the function assessPerformance() to each row of the DataFrame storesDF, we should use a collect() method to retrieve all rows and then apply the function to each row using a list comprehension. This is correctly done with [assessPerformance(row) for row in storesDF.collect()].

Discussion

2 comments
Sign in to comment
4be8126Option: D
Apr 26, 2023

The correct answer is D. Explanation: Option A uses the take() method to extract three rows from the DataFrame, but it applies the assessPerformance() function to each row outside of the DataFrame context. Option B attempts to apply the assessPerformance() function to each row, but it doesn't reference the row object in any way. Option C tries to apply the assessPerformance() function to the entire DataFrame but does so using an incorrect syntax. Option D correctly applies the assessPerformance() function to each row of the DataFrame using a list comprehension over the result of the collect() method. Option E is similar to D, but it will iterate over rows individually instead of using the collect() method to retrieve all rows at once. While this is still a valid approach, it may be less efficient.

ZSunOption: D
Jun 6, 2023

There are many way to apply a function to dataframe. 1. apply, as shown in option D. but it should be apply(assessPerformance) 2. list comprehension: for row in df.collect() 3. foreach 4. map, but for RDD majorly