Certified Associate Developer for Apache Spark Exam - Question 20

Question

Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000?

Examice · Accepted Answer

To filter a DataFrame where the value in a specific column meets a certain condition, the correct syntax in PySpark uses the filter method with a condition string. The appropriate code block to filter rows where the 'sqft' column value is less than or equal to 25,000 is storesDF.filter("sqft <= 25000").

4be8126 · Answer

The correct code block to return a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 is:

storesDF.filter("sqft <= 25000")
Option A incorrectly uses the wrong syntax for the filter condition, it should be "sqft <= 25000" instead of "sqft" <= 25000.

Option B uses the wrong operator (greater than instead of less than or equal to) and also needs to quote the column name as a string.

Option C uses square brackets instead of quotes to reference the column name, and also uses the wrong operator.

Option D uses the correct operator but needs to quote the column name as a string.

Option E uses the correct syntax, but needs to pass the column name as a string instead of using col("sqft").

Therefore, the correct answer is E.

storesDF.filter(col("sqft") <= 25000)

TmData · Answer

Option E, storesDF.filter(col("sqft") <= 25000), is the correct option. It uses the filter() operation with the condition col("sqft") <= 25000 to filter the rows where the value in the column sqft is less than or equal to 25,000.

Certified Associate Developer for Apache Spark Exam - Question 20

Discussion