Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000?
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000?
To filter a DataFrame where the value in a specific column meets a certain condition, the correct syntax in PySpark uses the filter method with a condition string. The appropriate code block to filter rows where the 'sqft' column value is less than or equal to 25,000 is storesDF.filter("sqft <= 25000").
The correct code block to return a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 is: storesDF.filter("sqft <= 25000") Option A incorrectly uses the wrong syntax for the filter condition, it should be "sqft <= 25000" instead of "sqft" <= 25000. Option B uses the wrong operator (greater than instead of less than or equal to) and also needs to quote the column name as a string. Option C uses square brackets instead of quotes to reference the column name, and also uses the wrong operator. Option D uses the correct operator but needs to quote the column name as a string. Option E uses the correct syntax, but needs to pass the column name as a string instead of using col("sqft"). Therefore, the correct answer is E. storesDF.filter(col("sqft") <= 25000)
Option E, storesDF.filter(col("sqft") <= 25000), is the correct option. It uses the filter() operation with the condition col("sqft") <= 25000 to filter the rows where the value in the column sqft is less than or equal to 25,000.