Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 72

Which of the following code blocks returns a new DataFrame where column sqft from DataFrame storesDF has had its missing values replaced with the value 30,000?

A sample of DataFrame storesDF is below:

    Correct Answer: E

    To replace missing values in the 'sqft' column of a DataFrame in PySpark, the `na.fill` function is used. The correct syntax for this function requires the value to fill in the missing spots followed by a list or array of the column names. The correct approach is `storesDF.na.fill(30000, ['sqft'])`. The given option E uses the correct syntax with the column name as a string, hence option E is correct.

Discussion
Samir_91Option: E

E is answer

Samir_91Option: E

E is answer. It's tested. A)AttributeError: module 'pyspark.sql.functions' has no attribute 'Seq' B)AttributeError: 'DataFrame' object has no attribute 'nafill' C)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column. D)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column.

learnsh1Option: A

typo error I THINK 1st arg is col name right ?

hosniadel666Option: A

Check fill function at scala API docs https://spark.apache.org/docs/3.0.0/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html#fill(value:Long,cols:Array%5BString%5D):org.apache.spark.sql.DataFrame

azure_bimonsterOption: C

To me C is likely correct, because we need to use col() C. storesDF.na.fill(30000, col("sqft"))