Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 61

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

Which command should be removed from the notebook before scheduling it as a job?

    Correct Answer: E

    When scheduling a job, interactive commands designed for exploration and visualization, such as display(), should be removed. The display() function is intended for use in the notebook interface to visualize data during development and is not suitable for automated jobs in production. Removing Cmd 6 avoids unnecessary computation and potential issues in a production environment.

Discussion
petrvOption: E

When scheduling a Databricks notebook as a job, it's generally recommended to remove or modify commands that involve displaying output, such as using the display() function. Displaying data using display() is an interactive feature designed for exploration and visualization within the notebook interface and may not work well in a production job context. The finalDF.explain() command, which provides the execution plan of the DataFrame transformations and actions, is often useful for debugging and optimizing queries. While it doesn't display interactive visualizations like display(), it can still be informative for understanding how Spark is executing the operations on your DataFrame.

alexvnoOption: E

No display()

Karen1232123Option: D

Why not D?

hal2401meOption: E

perhaps it's a multi-choice question in exam. I'll select E and D. if single choice then E.

KhoaLeOption: E

Looking through at all steps, Cmd 2,5,6 can be eliminated without impacting to the whole process. However, in terms of duration cost, Cmd 2 and 5 does not impact much as they only show the current results of logical query plan. In contrast, display() in Cmd6 is actually a transformation, which will take much time to run.

60tiesOption: D

No actions on production scripts. D is best

ofed

in order to display a dataframe you also need to calculate it. So display also acts as an action.