Certified Associate Developer for Apache Spark Exam - Question 42

Question

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.

Code block:

assessPerformanceUDF – udf(assessPerformance)

storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))

Examice · Accepted Answer

The return type of the assessPerformanceUDF() is not specified in the udf() operation. When creating a User-Defined Function (UDF) in PySpark using the udf() method, it is crucial to specify the return type. By default, the return type is set to StringType. To create a UDF that returns an integer, one must explicitly set the return type by providing it as the second argument to the udf() function. Without this specification, Spark SQL would not know how to handle the resulting data type of the function.

ZSun · Answer

The right answer is D.
pyspark.sql.functions.udf(f=None, returnType=StringType)
The default return type is string, but this question requires integer returning.
so it should be D. "The return type of the assessPerformanceUDF() is not specified in the udf() operation."

4be8126 · Answer

The error in the code block is A. The function assessPerformance() needs to be passed as a parameter to the udf() operation in order to create a UDF from it. The correct code block should be:

assessPerformanceUDF = udf(assessPerformance)
storesDF.withColumn("result", assessPerformanceUDF(col(

Deuterium · Answer

Right answer is D, return type has to be specified into udf() or it will return StringType by default, the code should be : 
function_UDF = udf(function, returnType=IntegerType())

thanab · Answer

The error in the code block is that the return type of the assessPerformanceUDF() is not specified in the udf() operation. In PySpark, when you register a Python function as a UDF, you should also specify the return type. This is important because Spark SQL needs to understand the return type to properly handle the UDF. Therefore, the correct answer is:

Singh_Sumit · Answer

|      1. When `f` is a Python function:
 |      
 |          `returnType` defaults to string type and can be optionally specified. The produced
 |          object must match the specified type. In this case, this API works as if
 |          `register(name, f, returnType=StringType())`.

cookiemonster42 · Answer

if they mean that - is =, then we need a second parameter, the output type. so, D is the answe

juliom6 · Answer

It is necessary to inform the return type as IntegerType().

from pyspark.sql.functions import udf, col
from pyspark.sql.types import IntegerType

storesDF = spark.createDataFrame([('1', '123'), ('2', '234')], ['id', 'customerSatisfaction'])
assessPerformance = lambda x: int(x)

assessPerformanceUDF = udf(assessPerformance, IntegerType())
storesDF.withColumn('result', assessPerformanceUDF(col('customerSatisfaction'))).printSchema()

Raheel_te · Answer

correct answer is D

Certified Associate Developer for Apache Spark Exam - Question 42

Discussion