Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 7

The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".

The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.

Which code block accomplishes this task while minimizing potential compute costs?

    Correct Answer: A

    To accomplish the task of saving predictions to a Delta Lake table with the ability to compare all predictions across time while minimizing potential compute costs, the provided code block in option A is suitable. It uses the `saveAsTable` method with the mode set to `append`, ensuring that predictions are continually added to the table without overwriting previous entries. This approach maintains a historical record of predictions as required and fits the batch processing context since the churn predictions are made at most once per day. It's important to note that in Databricks, databases and table creation typically default to Delta Lake format, so there's no need to explicitly specify the format when using `saveAsTable`.

Discussion
thxsgodOption: A

You need: - Batch operation since it is at most once a day - Append, since you need to keep track of past predictions A is the correct answer. You don't need to specify "format" when you use saveAsTable.

EertyyOption: B

answer is B

Eertyy

Here's why: A. saves the data as a managed table, which may not be efficient for large-scale data or frequent updates. It doesn't utilize Delta Lake capabilities. C.is used for streaming operations, not batch processing. Also, using "overwrite" as output mode will replace the existing data each time, which is not suitable for keeping historical predictions. D.is similar to option A but with "overwrite" mode. It will replace the entire table each time, which is not suitable for maintaining a historical record of predictions. E. is also for streaming operations and not for batch processing. Additionally, it uses the "table" method, which is not typically used for writing batch data into Delta Lake tables. Option B is suitable for batch processing, writes data in Delta Lake format, and allows you to efficiently maintain a historical record of predictions while minimizing compute costs.

pradyumn9999

Its also said they want to compare past values as well, so mode needs to be append. By default is error mode.

Starvosxant

First: the default node Databricks saves tables IS Delta Format. So no reason why you say it wouldn't benefit from Lakehouse features. Second: the default write mode is Error, means that if you try to write to a location and that already exists there, it will prone a Error. And the question specify that you gonna write once a day. You better revisit basic topics before continue to the professional level certification, or buy the dump entirely.

coercionOption: A

default table format is delta so no need to specify the format. As per the requirement, "append" mode is required to maintain the history. Default mode is "ErrorIfExists"

Jay_98_11Option: A

A is correct

kz_dataOption: A

A is correct

sturcuOption: A

Correct

sturcu

Correct

buggumasterOption: A

Selected answer is wrong, not write Format is specified in A.

buggumasterOption: B

Selected answer is wrong, not writeMode is specified in A.