Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 42

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.

Code block:

assessPerformanceUDF – udf(assessPerformance)

storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))

    Correct Answer: D

    The return type of the assessPerformanceUDF() is not specified in the udf() operation. When creating a User-Defined Function (UDF) in PySpark using the udf() method, it is crucial to specify the return type. By default, the return type is set to StringType. To create a UDF that returns an integer, one must explicitly set the return type by providing it as the second argument to the udf() function. Without this specification, Spark SQL would not know how to handle the resulting data type of the function.

Discussion
ZSunOption: D

The right answer is D. pyspark.sql.functions.udf(f=None, returnType=StringType) The default return type is string, but this question requires integer returning. so it should be D. "The return type of the assessPerformanceUDF() is not specified in the udf() operation."

4be8126Option: A

The error in the code block is A. The function assessPerformance() needs to be passed as a parameter to the udf() operation in order to create a UDF from it. The correct code block should be: assessPerformanceUDF = udf(assessPerformance) storesDF.withColumn("result", assessPerformanceUDF(col(

ZSun

what is the difference between your code and question itsefl? assessPerformanceUDF – udf(assessPerformance) assessPerformanceUDF = udf(assessPerformance) changing "-" to "="?

Singh_SumitOption: D

| 1. When `f` is a Python function: | | `returnType` defaults to string type and can be optionally specified. The produced | object must match the specified type. In this case, this API works as if | `register(name, f, returnType=StringType())`.

thanabOption: D

The error in the code block is that the return type of the assessPerformanceUDF() is not specified in the udf() operation. In PySpark, when you register a Python function as a UDF, you should also specify the return type. This is important because Spark SQL needs to understand the return type to properly handle the UDF. Therefore, the correct answer is:

DeuteriumOption: D

Right answer is D, return type has to be specified into udf() or it will return StringType by default, the code should be : function_UDF = udf(function, returnType=IntegerType())

Raheel_teOption: D

correct answer is D

juliom6Option: D

It is necessary to inform the return type as IntegerType(). from pyspark.sql.functions import udf, col from pyspark.sql.types import IntegerType storesDF = spark.createDataFrame([('1', '123'), ('2', '234')], ['id', 'customerSatisfaction']) assessPerformance = lambda x: int(x) assessPerformanceUDF = udf(assessPerformance, IntegerType()) storesDF.withColumn('result', assessPerformanceUDF(col('customerSatisfaction'))).printSchema()

cookiemonster42Option: D

if they mean that - is =, then we need a second parameter, the output type. so, D is the answe