Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 21


Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30?

Show Answer
Correct Answer: E

To return a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30, the correct code block is storesDF.filter((col('sqft') <= 25000) | (col('customerSatisfaction') >= 30)). This syntax correctly uses the col() function to reference columns and the logical OR operator (|) to combine the conditions, ensuring both conditions are evaluated properly.

Discussion

4 comments
Sign in to comment
TmDataOption: E
Jun 17, 2023

Option E, storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30)), is the correct option. It uses the filter() operation with the conditions (col("sqft") <= 25000) | (col("customerSatisfaction") >= 30) to filter the rows where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30.

4be8126Option: E
Apr 26, 2023

The correct code block to return a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value in column customerSatisfaction is greater than or equal to 30 is: storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30)) Option A uses a single pipe (|) instead of the correct syntax of two vertical bars (||) to represent "OR" logic, and also uses the wrong syntax for column referencing. Option B uses the correct or operator, but also uses the wrong syntax for column referencing. Option C uses the correct operator and syntax for column referencing, but does not use the col() function to reference column names. Option D uses the col() function, but also uses the wrong syntax for column referencing. Option E uses the correct syntax for both column referencing and logical operator, and correctly specifies the parentheses to ensure the proper order of operations. Therefore, the correct answer is E. storesDF.filter((col("sqft") <= 25000) | (col("customerSatisfaction") >= 30))

SonicBoom10C9Option: E
May 15, 2023

E has the right syntax, logic, operator and correct number of parentheses. All of the others falter in one of these respects.

pierre_grnsOption: A
Apr 26, 2023

Should be A. Tested it in communitity edition with 2 filters.

pierre_grns
Apr 26, 2023

sorry, we need 2 paranthesis indeed. So E !

sly75
May 3, 2023

Yes I agree, it's E

evertonllins
Aug 26, 2023

Congrats man, not everyone goes back to tell they were wrong and corrects them selves. We need more people like this on this platform