Certified Associate Developer for Apache Spark Exam - Question 49

Question

The code block shown below contains an error. The code block is intended to return a DataFrame containing a column openDateString, a string representation of Java’s SimpleDateFormat. Identify the error.

Note that column openDate is of type integer and represents a date in the UNIX epoch format – the number of seconds since midnight on January 1st, 1970.

An example of Java’s SimpleDateFormat is "Sunday, Dec 4, 2008 1:05 PM".

A sample of storesDF is displayed below:

Code block:

storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a", TimestampType()))

Examice · Accepted Answer

The from_unixtime() function in PySpark accepts only two parameters: the column to convert and the format string. The additional third parameter (TimestampType()) in the provided code block is unnecessary and causes the error. Removing this third parameter will resolve the issue and correctly convert the UNIX epoch format date into the desired string format.

zozoshanky · Answer

A is also right.

juliom6 · Answer

A is correct:

from pyspark.sql.functions import from_unixtime, col

storesDF = spark.createDataFrame([(0, 1100746394), (1, 1474410343)], ['storeId', 'openDate'])
storesDF = storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a"))
display(storesDF)

Jtic · Answer

B. The from_unixtime() operation only works if column openDate is of type long rather than integer - column openDate must first be converted.

This option is correct. The code block has an error because the from_unixtime() function expects the column openDate to be of type long, not integer. The column should be cast to long before applying the function.

nicklasbekkevold · Answer

A is the right answer.

Function signature from the docs:
pyspark.sql.functions.from_unixtime(timestamp, format='uuuu-MM-dd HH:mm:ss')

Certified Associate Developer for Apache Spark Exam - Question 49

Discussion