Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 69


The code block shown below contains an error. The code block is intended to return a new DataFrame where column managerName from DataFrame storesDF is split at the space character into column managerFirstName and column managerLastName. Identify the error.

A sample of DataFrame storesDF is displayed below:

Code block:

storesDF.withColumn("managerFirstName", col("managerName").split(" ").getItem(0))

.withColumn("managerLastName", col("managerName").split(" ").getItem(1))

Show Answer
Correct Answer: D

The error in the code block is due to the misuse of the split() operation. The split() function is part of the imported functions from pyspark.sql and should be used correctly with a Column object and a split character as arguments. It is not a method of a Column object. Therefore, the correct code should use split(col("managerName"), " ") to split the managerName column by the space character and then index the result to extract the first and last names.

Discussion

4 comments
Sign in to comment
zozoshanky
Jul 30, 2023

Can be C as an answer too.

cookiemonster42
Jul 31, 2023

But you have to pass a column as an object, not a string. you have to use col() expression. So D is the right one.

65bd33e
May 6, 2024

Yes, I agree with you, D is correct we have to pass a column as an object

newusernameOption: C
Nov 7, 2023

I think it is C data = [ ("John Smith",), ("Jane Doe",), ("Mike Johnson",) ] df = spark.createDataFrame(data, ["managerName"]) df.show() df = df.withColumn("managerFirstName", split(col("managerName"), " ").getItem(0)) \ .withColumn("managerLastName", split(col("managerName"), " ").getItem(1)) df.show()

cd6a625
Jul 8, 2024

in your example, your are using split( col("managerName"), ... ) and not split("managerName", ...) <- means that answer is D

AhloOption: C
Feb 26, 2024

Answer C pyspark.sql.functions provides a function split() to split DataFrame string Column into multiple columns. https://sparkbyexamples.com/pyspark/pyspark-split-dataframe-column-into-multiple-columns/

Sowwy1Option: D
Apr 10, 2024

D. The split() operation comes from the imported functions object. It accepts a Column object and split character as arguments. It is not a method of a Column object.