Certified Associate Developer for Apache Spark Exam QuestionsBrowse all questions from this exam

Certified Associate Developer for Apache Spark Exam - Question 74


QUESTION NO: 75 -

Which of the following code blocks returns a DataFrame where column divisionDistinct is the approximate number of distinct values in column division from DataFrame storesDF?

Show Answer
Correct Answer: C

To return a DataFrame with a column named divisionDistinct that contains the approximate number of distinct values in the column division from storesDF, the correct code block is storesDF.agg(approx_count_distinct(col('division')).alias('divisionDistinct')). This uses the agg function to perform an aggregation on the entire DataFrame, and approx_count_distinct to compute the approximate count of distinct values, with alias used to rename the resulting column.

Discussion

1 comment
Sign in to comment
Sowwy1Option: C
Apr 2, 2024

I think it's C https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.approx_count_distinct.html