Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 43


The code block shown below contains an error. The code block is intended to use SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF. Identify the error.

Code block:

storesDF.createOrReplaceTempView("stores")

storesDF.sql("SELECT storeId, managerName FROM stores")

Show Answer
Correct Answer: B

The sql() function is not a method of the DataFrame object. It is a method of the SparkSession object spark. The correct way to execute a SQL statement using Spark SQL is to call sql() on the SparkSession object as follows: spark.sql('SELECT storeId, managerName FROM stores').

Discussion

2 comments
Sign in to comment
4be8126Option: B
May 1, 2023

Option B is correct because the sql() function is not a method of a DataFrame object. It is actually a method of the SparkSession object spark. Therefore, the correct way to execute a SQL statement using Spark SQL is to call sql() on the SparkSession object as follows: spark.sql("SELECT storeId, managerName FROM stores") In the code block provided in the question, sql() is called on a DataFrame object, which will result in a DataFrame object without executing the SQL statement. Therefore, option B correctly identifies the error in the code block.

juliom6Option: B
Nov 2, 2023

B is correct: storesDF = spark.createDataFrame([('1', 'juan'), ('2', 'perez')], ['storeId', 'managerName']) storesDF.createOrReplaceTempView("stores") spark.sql("SELECT storeId, managerName FROM stores").show()