DP-600 Exam QuestionsBrowse all questions from this exam

DP-600 Exam - Question 56


Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You have a Fabric tenant that contains a new semantic model in OneLake.

You use a Fabric notebook to read the data into a Spark DataFrame.

You need to evaluate the data to calculate the min, max, mean, and standard deviation values for all the string and numeric columns.

Solution: You use the following PySpark expression:

df.summary()

Does this meet the goal?

Show Answer
Correct Answer: A

Using the PySpark expression df.summary() will provide the min, max, mean, and standard deviation values for both numeric and string columns. The df.summary() function in PySpark is specifically designed to compute these statistics for all types of columns within the DataFrame, thereby meeting the goal of evaluating the data thoroughly.

Discussion

7 comments
Sign in to comment
SamuComqiOption: A
Feb 18, 2024

Also df.describe() is a valid solution. Sources: * summary --> https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.summary.html * describe --> https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.describe.html

stilferxOption: A
May 10, 2024

IMHO, A Example: df1 = spark.createDataFrame([(1, 10), (2, 10), (2, 15)], schema = ['fruit_id', 'amount']) df1.summary() summary fruit_id amount count 3 3 mean 1.6666666666666667 11.666666666666666 stddev 0.5773502691896257 2.886751345948129 min 1 10 25% 1 10 50% 2 10 75% 2 15 max 2 15

282b85dOption: B
May 28, 2024

while df.summary() does provide valuable information for numeric columns, it does not fully meet the goal of evaluating both string and numeric columns with the required statistical measures. Use df.summary() and df.agg() to cover numeric columns, and additional custom aggregations for string columns.

MomoanwarOption: A
Feb 17, 2024

Correct

7d97b62Option: A
Jul 8, 2024

In pandas, use df.describe() for summary statistics of numeric columns. In PySpark, use df.summary() for summary statistics of both numeric and string columns in a distributed computing environment.

XiltroXOption: A
Feb 27, 2024

df.summary() is the only option where you can get MIX, MAX and AVG

6d1de25Option: A
Jul 13, 2024

Correct