Which of the following code blocks returns a new DataFrame where column sqft from DataFrame storesDF has had its missing values replaced with the value 30,000?
A sample of DataFrame storesDF is below:
Which of the following code blocks returns a new DataFrame where column sqft from DataFrame storesDF has had its missing values replaced with the value 30,000?
A sample of DataFrame storesDF is below:
To replace missing values in the 'sqft' column of a DataFrame in PySpark, the `na.fill` function is used. The correct syntax for this function requires the value to fill in the missing spots followed by a list or array of the column names. The correct approach is `storesDF.na.fill(30000, ['sqft'])`. The given option E uses the correct syntax with the column name as a string, hence option E is correct.
E is answer
E is answer. It's tested. A)AttributeError: module 'pyspark.sql.functions' has no attribute 'Seq' B)AttributeError: 'DataFrame' object has no attribute 'nafill' C)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column. D)PySparkTypeError: [NOT_LIST_OR_TUPLE] Argument `subset` should be a list or tuple, got Column.
typo error I THINK 1st arg is col name right ?
Check fill function at scala API docs https://spark.apache.org/docs/3.0.0/api/scala/org/apache/spark/sql/DataFrameNaFunctions.html#fill(value:Long,cols:Array%5BString%5D):org.apache.spark.sql.DataFrame
To me C is likely correct, because we need to use col() C. storesDF.na.fill(30000, col("sqft"))