Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 61


A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

Which command should be removed from the notebook before scheduling it as a job?

Show Answer
Correct Answer: E

When scheduling a job, interactive commands designed for exploration and visualization, such as display(), should be removed. The display() function is intended for use in the notebook interface to visualize data during development and is not suitable for automated jobs in production. Removing Cmd 6 avoids unnecessary computation and potential issues in a production environment.

Discussion

6 comments
Sign in to comment
petrvOption: E
Nov 30, 2023

When scheduling a Databricks notebook as a job, it's generally recommended to remove or modify commands that involve displaying output, such as using the display() function. Displaying data using display() is an interactive feature designed for exploration and visualization within the notebook interface and may not work well in a production job context. The finalDF.explain() command, which provides the execution plan of the DataFrame transformations and actions, is often useful for debugging and optimizing queries. While it doesn't display interactive visualizations like display(), it can still be informative for understanding how Spark is executing the operations on your DataFrame.

alexvnoOption: E
Dec 18, 2023

No display()

Karen1232123Option: D
Nov 3, 2023

Why not D?

60tiesOption: D
Nov 14, 2023

No actions on production scripts. D is best

ofed
Nov 16, 2023

in order to display a dataframe you also need to calculate it. So display also acts as an action.

KhoaLeOption: E
Feb 8, 2024

Looking through at all steps, Cmd 2,5,6 can be eliminated without impacting to the whole process. However, in terms of duration cost, Cmd 2 and 5 does not impact much as they only show the current results of logical query plan. In contrast, display() in Cmd6 is actually a transformation, which will take much time to run.

hal2401meOption: E
Feb 26, 2024

perhaps it's a multi-choice question in exam. I'll select E and D. if single choice then E.