QUESTION NO: 75 -
Which of the following code blocks returns a DataFrame where column divisionDistinct is the approximate number of distinct values in column division from DataFrame storesDF?
QUESTION NO: 75 -
Which of the following code blocks returns a DataFrame where column divisionDistinct is the approximate number of distinct values in column division from DataFrame storesDF?
To return a DataFrame with a column named divisionDistinct that contains the approximate number of distinct values in the column division from storesDF, the correct code block is storesDF.agg(approx_count_distinct(col('division')).alias('divisionDistinct')). This uses the agg function to perform an aggregation on the entire DataFrame, and approx_count_distinct to compute the approximate count of distinct values, with alias used to rename the resulting column.
I think it's C https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.approx_count_distinct.html