Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 69

The code block shown below contains an error. The code block is intended to return a new DataFrame where column managerName from DataFrame storesDF is split at the space character into column managerFirstName and column managerLastName. Identify the error.

A sample of DataFrame storesDF is displayed below:

Code block:

storesDF.withColumn("managerFirstName", col("managerName").split(" ").getItem(0))

.withColumn("managerLastName", col("managerName").split(" ").getItem(1))

    Correct Answer: D

    The error in the code block is due to the misuse of the split() operation. The split() function is part of the imported functions from pyspark.sql and should be used correctly with a Column object and a split character as arguments. It is not a method of a Column object. Therefore, the correct code should use split(col("managerName"), " ") to split the managerName column by the space character and then index the result to extract the first and last names.

Discussion
Sowwy1Option: D

D. The split() operation comes from the imported functions object. It accepts a Column object and split character as arguments. It is not a method of a Column object.

AhloOption: C

Answer C pyspark.sql.functions provides a function split() to split DataFrame string Column into multiple columns. https://sparkbyexamples.com/pyspark/pyspark-split-dataframe-column-into-multiple-columns/

newusernameOption: C

I think it is C data = [ ("John Smith",), ("Jane Doe",), ("Mike Johnson",) ] df = spark.createDataFrame(data, ["managerName"]) df.show() df = df.withColumn("managerFirstName", split(col("managerName"), " ").getItem(0)) \ .withColumn("managerLastName", split(col("managerName"), " ").getItem(1)) df.show()

cd6a625

in your example, your are using split( col("managerName"), ... ) and not split("managerName", ...) <- means that answer is D

zozoshanky

Can be C as an answer too.

cookiemonster42

But you have to pass a column as an object, not a string. you have to use col() expression. So D is the right one.

65bd33e

Yes, I agree with you, D is correct we have to pass a column as an object