Certified Data Engineer Professional Exam - Question 7

Question

The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".

The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.

Which code block accomplishes this task while minimizing potential compute costs?

Examice · Accepted Answer

To accomplish the task of saving predictions to a Delta Lake table with the ability to compare all predictions across time while minimizing potential compute costs, the provided code block in option A is suitable. It uses the `saveAsTable` method with the mode set to `append`, ensuring that predictions are continually added to the table without overwriting previous entries. This approach maintains a historical record of predictions as required and fits the batch processing context since the churn predictions are made at most once per day. It's important to note that in Databricks, databases and table creation typically default to Delta Lake format, so there's no need to explicitly specify the format when using `saveAsTable`.

thxsgod · Answer

You need:
- Batch operation since it is at most once a day
- Append, since you need to keep track of past predictions

A is the correct answer. You don't need to specify "format" when you use saveAsTable.

Eertyy · Answer

answer is B

buggumaster · Answer

Selected answer is wrong, not writeMode is specified in A.

buggumaster · Answer

Selected answer is wrong, not write Format is specified in A.

sturcu · Answer

Correct

sturcu · Answer

Correct

kz_data · Answer

A is correct

Jay_98_11 · Answer

A is correct

coercion · Answer

default table format is delta so no need to specify the format.
As per the requirement, "append" mode is required to maintain the history. Default mode is "ErrorIfExists"

Certified Data Engineer Professional Exam - Question 7

Discussion