Certified Associate Developer for Apache Spark Exam - Question 114

Question

Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 AND the value in column customerSatisfaction is greater than or equal to 30?

Examice · Accepted Answer

The correct code block to filter the DataFrame where the value in column sqft is less than or equal to 25,000 and the value in column customerSatisfaction is greater than or equal to 30 is 'storesDF.filter(col("sqft") <= 25000 & col("customerSatisfaction") >= 30)'. In PySpark, '&' is used for logical AND operations, not 'and'.

MSH_6 · Answer

A is the right answer.

sionita · Answer

The answer should be E. In case of multiple conditions spark requires () such as:
df.filter( (cond1) & (cond2) )

Sowwy1 · Answer

It's D:
https://sparkbyexamples.com/spark/spark-and-or-not-operators/

PySpark Logical operations use the bitwise operators:

& for and
| for or
~ for not

deadbeef38 · Answer

A is right

Certified Associate Developer for Apache Spark Exam - Question 114

Discussion