Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 49

The code block shown below contains an error. The code block is intended to return a DataFrame containing a column openDateString, a string representation of Java’s SimpleDateFormat. Identify the error.

Note that column openDate is of type integer and represents a date in the UNIX epoch format – the number of seconds since midnight on January 1st, 1970.

An example of Java’s SimpleDateFormat is "Sunday, Dec 4, 2008 1:05 PM".

A sample of storesDF is displayed below:

Code block:

storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a", TimestampType()))

    Correct Answer: A

    The from_unixtime() function in PySpark accepts only two parameters: the column to convert and the format string. The additional third parameter (TimestampType()) in the provided code block is unnecessary and causes the error. Removing this third parameter will resolve the issue and correctly convert the UNIX epoch format date into the desired string format.

Discussion
juliom6Option: A

A is correct: from pyspark.sql.functions import from_unixtime, col storesDF = spark.createDataFrame([(0, 1100746394), (1, 1474410343)], ['storeId', 'openDate']) storesDF = storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a")) display(storesDF)

zozoshankyOption: A

A is also right.

nicklasbekkevoldOption: A

A is the right answer. Function signature from the docs: pyspark.sql.functions.from_unixtime(timestamp, format='uuuu-MM-dd HH:mm:ss')

JticOption: B

B. The from_unixtime() operation only works if column openDate is of type long rather than integer - column openDate must first be converted. This option is correct. The code block has an error because the from_unixtime() function expects the column openDate to be of type long, not integer. The column should be cast to long before applying the function.

ZSun

This is completely nonsense about long and integer. long (or bigint): It is a 64-bit signed integer data type anging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. integer (or int): It is a 32-bit signed integer data ranging from -2,147,483,648 to 2,147,483,647

juliom6

That not make sense, the code below works perfectly: from pyspark.sql.functions import from_unixtime, col storesDF = spark.createDataFrame([(0, 1100746394), (1, 1474410343)], ['storeId', 'openDate']) storesDF = storesDF.withColumn('openDate', col('openDate').cast('integer')) storesDF = storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a")) display(storesDF)