Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 66


Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30?

Show Answer
Correct Answer: B

To filter a DataFrame based on multiple conditions in PySpark, we use the `filter` method with column objects and logical operators. The correct syntax uses the `|` operator for 'or' and requires `col` to reference column names. Thus, the statement 'storesDF.filter(col('sqft') <= 25000 | col('customerSatisfaction') >= 30)' correctly applies the filter conditions using the appropriate syntax and logical operator.

Discussion

1 comment
Sign in to comment
Akash567890978Option: B
Jan 5, 2024

I dont think even B is correct the conditions should be inside parenthesis as well