Certified Associate Developer for Apache Spark Exam - Question 72

Question

Which of the following code blocks returns a new DataFrame where column sqft from DataFrame storesDF has had its missing values replaced with the value 30,000?A sample of DataFrame storesDF is below:.

Examice · Accepted Answer

To replace missing values in the 'sqft' column of a DataFrame in PySpark, the `na.fill` function is used. The correct syntax for this function requires the value to fill in the missing spots followed by a list or array of the column names. The correct approach is `storesDF.na.fill(30000, ['sqft'])`. The given option E uses the correct syntax with the column name as a string, hence option E is correct.

learnsh1 · Answer

typo error I THINK  1st arg is col name right ?

Samir_91 · Answer

E is answer. It's tested.
A)AttributeError: module 'pyspark.sql.functions' has no attribute 'Seq'
B)AttributeError: 'DataFrame' object has no attribute 'nafill'
C)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column.
D)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column.

Samir_91 · Answer

E is answer

azure_bimonster · Answer

To me C is likely correct, because we need to use col()

C. storesDF.na.fill(30000, col("sqft"))

hosniadel666 · Answer

Check fill function at scala API docs
https://spark.apache.org/docs/3.0.0/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html#fill(value:Long,cols:Array%5BString%5D):org.apache.spark.sql.DataFrame

Certified Associate Developer for Apache Spark Exam - Question 72

Discussion