Which of the following code blocks uses SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF?
Which of the following code blocks uses SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF?
To return a DataFrame containing specific columns using SQL in PySpark, you need to first create a temporary view of the original DataFrame, then execute an SQL query on that view. The code block in option D properly creates a temporary view named 'stores' from the DataFrame 'storesDF' using 'createOrReplaceTempView'. Then, it uses 'spark.sql' to execute the SQL query 'SELECT storeId, managerName FROM stores' on the temporary view. This will return a new DataFrame containing only the 'storeId' and 'managerName' columns.
D is correct