Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 140

Which of the following code blocks uses SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF?

    Correct Answer: D

    To return a DataFrame containing specific columns using SQL in PySpark, you need to first create a temporary view of the original DataFrame, then execute an SQL query on that view. The code block in option D properly creates a temporary view named 'stores' from the DataFrame 'storesDF' using 'createOrReplaceTempView'. Then, it uses 'spark.sql' to execute the SQL query 'SELECT storeId, managerName FROM stores' on the temporary view. This will return a new DataFrame containing only the 'storeId' and 'managerName' columns.

Discussion
Sowwy1Option: D

D is correct