Exam Certified Associate Developer for Apache Spark All QuestionsBrowse all questions from this exam
Question 35

Which of the following code blocks returns a collection of summary statistics for all columns in

DataFrame storesDF?

    Correct Answer: E

    The describe() method in DataFrame returns a DataFrame with summary statistics for all numeric columns in the input DataFrame. These statistics include count, mean, standard deviation, minimum, and maximum values. By calling storesDF.describe(), it provides a comprehensive summary without needing additional parameters.

Discussion
zozoshankyOption: E

E is correct, it's giving the output.

dbdantasOption: E

E is the correct one

azure_bimonsterOption: E

E would be correct here

mahmoud_salah30Option: E

tested e is the right answer

souha_axaOption: E

E is the correct answer

cookiemonster42Option: E

check the documentation, mates. both methods receive names of columns as arguments, so E is correct!

zozoshankyOption: B

B is correct. On running the last option it gives error. TypeError: describe() got an unexpected keyword argument 'all'

cookiemonster42

checked it, it gave me the right result, so E is the one

4be8126Option: B

The answer is B. Explanation: The describe() method in DataFrame returns a DataFrame with summary statistics for all numeric columns in the input DataFrame. By default, only the count, mean, standard deviation, minimum, and maximum values are returned, but additional statistics can be specified with the percentiles parameter. Setting the all parameter to True will include non-numeric columns in the output as well. Therefore, option B is the correct answer. Option A is not correct, as the summary() method only returns summary statistics for the specified column(s) and is not a valid option for returning summary statistics for all columns in the DataFrame. Option C is not correct, as the describe() method does not have an "all" option. Option D is also not correct, as the summary() method only returns summary statistics for the specified column(s) and does not have an "all" option. Option E is not incorrect, but it does not specify whether to include non-numeric columns in the output. Therefore, option B is a better answer.

ZSun

Did you really try this in pyspark, or look up the document? TypeError: describe() got an unexpected keyword argument 'all'

8605246

describe() is correct

Deuterium

Is you answer from Chat GPT ?

cookiemonster42

even chat gpt says E is the correct one :)

juadaves

TypeError Traceback (most recent call last) <ipython-input-34-5077330dead7> in <cell line: 1>() ----> 1 storesDF.describe(all = True) TypeError: DataFrame.describe() got an unexpected keyword argument 'all'