Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 72


Which of the following code blocks returns a new DataFrame where column sqft from DataFrame storesDF has had its missing values replaced with the value 30,000?

A sample of DataFrame storesDF is below:

Show Answer
Correct Answer: E

To replace missing values in the 'sqft' column of a DataFrame in PySpark, the `na.fill` function is used. The correct syntax for this function requires the value to fill in the missing spots followed by a list or array of the column names. The correct approach is `storesDF.na.fill(30000, ['sqft'])`. The given option E uses the correct syntax with the column name as a string, hence option E is correct.

Discussion

5 comments
Sign in to comment
learnsh1Option: A
Feb 8, 2024

typo error I THINK 1st arg is col name right ?

Samir_91Option: E
Jun 4, 2024

E is answer. It's tested. A)AttributeError: module 'pyspark.sql.functions' has no attribute 'Seq' B)AttributeError: 'DataFrame' object has no attribute 'nafill' C)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column. D)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column.

Samir_91Option: E
Jun 8, 2024

E is answer

azure_bimonsterOption: C
Feb 9, 2024

To me C is likely correct, because we need to use col() C. storesDF.na.fill(30000, col("sqft"))

hosniadel666Option: A
Mar 21, 2024

Check fill function at scala API docs https://spark.apache.org/docs/3.0.0/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html#fill(value:Long,cols:Array%5BString%5D):org.apache.spark.sql.DataFrame