Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 56


The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a position-wise union between DataFrame storesDF and DataFrame acquiredStoresDF. Identify the error.

Code block:

storesDF.unionByName(acquiredStoresDF)

Show Answer
Correct Answer: C

The DataFrame.unionByName() operation does not union DataFrames based on column position – it unions DataFrames based on column names. Therefore, the error is in attempting a position-wise union with this method. To achieve a position-wise union, the correct method to use would be union(), which performs a union regardless of column names.

Discussion

3 comments
Sign in to comment
4be8126Option: C
May 3, 2023

The error in the code block is: C. The DataFrame.unionByName() operation does not union DataFrames based on column position – it uses column name instead. The unionByName() operation performs a position-wise union based on column names, not based on column positions. Therefore, the error in the code block is that the intended operation should be union(), which performs a position-wise union regardless of column names. The correct code block to perform a position-wise union between DataFrame storesDF and DataFrame acquiredStoresDF would be: storesDF.union(acquiredStoresDF)

newusernameOption: C
Nov 7, 2023

C is correct - https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.unionByName.html

juliom6Option: C
Nov 14, 2023

C is correct according to documentation: https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.unionByName.html "The difference between this function and union() is that this function resolves columns by name (not by position)"