Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 49


The code block shown below contains an error. The code block is intended to return a DataFrame containing a column openDateString, a string representation of Java’s SimpleDateFormat. Identify the error.

Note that column openDate is of type integer and represents a date in the UNIX epoch format – the number of seconds since midnight on January 1st, 1970.

An example of Java’s SimpleDateFormat is "Sunday, Dec 4, 2008 1:05 PM".

A sample of storesDF is displayed below:

Code block:

storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a", TimestampType()))

Show Answer
Correct Answer: A

The from_unixtime() function in PySpark accepts only two parameters: the column to convert and the format string. The additional third parameter (TimestampType()) in the provided code block is unnecessary and causes the error. Removing this third parameter will resolve the issue and correctly convert the UNIX epoch format date into the desired string format.

Discussion

4 comments
Sign in to comment
zozoshankyOption: A
Jul 30, 2023

A is also right.

juliom6Option: A
Nov 14, 2023

A is correct: from pyspark.sql.functions import from_unixtime, col storesDF = spark.createDataFrame([(0, 1100746394), (1, 1474410343)], ['storeId', 'openDate']) storesDF = storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a")) display(storesDF)

JticOption: B
May 28, 2023

B. The from_unixtime() operation only works if column openDate is of type long rather than integer - column openDate must first be converted. This option is correct. The code block has an error because the from_unixtime() function expects the column openDate to be of type long, not integer. The column should be cast to long before applying the function.

ZSun
Jun 6, 2023

This is completely nonsense about long and integer. long (or bigint): It is a 64-bit signed integer data type anging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. integer (or int): It is a 32-bit signed integer data ranging from -2,147,483,648 to 2,147,483,647

juliom6
Nov 14, 2023

That not make sense, the code below works perfectly: from pyspark.sql.functions import from_unixtime, col storesDF = spark.createDataFrame([(0, 1100746394), (1, 1474410343)], ['storeId', 'openDate']) storesDF = storesDF.withColumn('openDate', col('openDate').cast('integer')) storesDF = storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a")) display(storesDF)

nicklasbekkevoldOption: A
Aug 23, 2023

A is the right answer. Function signature from the docs: pyspark.sql.functions.from_unixtime(timestamp, format='uuuu-MM-dd HH:mm:ss')