Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 20


Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000?

Show Answer
Correct Answer: AE

To filter a DataFrame where the value in a specific column meets a certain condition, the correct syntax in PySpark uses the filter method with a condition string. The appropriate code block to filter rows where the 'sqft' column value is less than or equal to 25,000 is storesDF.filter("sqft <= 25000").

Discussion

2 comments
Sign in to comment
4be8126Option: E
Apr 26, 2023

The correct code block to return a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 is: storesDF.filter("sqft <= 25000") Option A incorrectly uses the wrong syntax for the filter condition, it should be "sqft <= 25000" instead of "sqft" <= 25000. Option B uses the wrong operator (greater than instead of less than or equal to) and also needs to quote the column name as a string. Option C uses square brackets instead of quotes to reference the column name, and also uses the wrong operator. Option D uses the correct operator but needs to quote the column name as a string. Option E uses the correct syntax, but needs to pass the column name as a string instead of using col("sqft"). Therefore, the correct answer is E. storesDF.filter(col("sqft") <= 25000)

TmDataOption: E
Jun 17, 2023

Option E, storesDF.filter(col("sqft") <= 25000), is the correct option. It uses the filter() operation with the condition col("sqft") <= 25000 to filter the rows where the value in the column sqft is less than or equal to 25,000.