Databricks Certified Machine Learning Professional Exam Questions

Question 6 of 82

A data scientist has developed a model to predict ice cream sales using the expected temperature and expected number of hours of sun in the day. However, the expected temperature is dropping beneath the range of the input variable on which the model was trained.
Which of the following types of drift is present in the above scenario?

Label drift

None of these

Concept drift

Prediction drift

Feature drift

Correct Answer: E

Feature drift occurs when the properties or distribution of the input features change. In this scenario, the expected temperature, an input feature for the model, is dropping beneath the range seen during training, indicating a shift in the feature distribution.

Question 7 of 82

A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.
Which of the following code blocks accomplishes this task?

spark.read.format(“delta”).load(path).drop(“star_rating”)

spark.read.format(“delta”).table(path).drop(“star_rating”)

Delta tables cannot be modified

spark.read.table(path).drop(“star_rating”)

spark.sql(“SELECT * EXCEPT star_rating FROM path”)

Correct Answer: A

To remove a column from a Delta table, the approach involves loading the table into a DataFrame and then utilizing the drop method to remove the specified column. The correct code snippet accomplishes this by using spark.read.format('delta').load(path).drop('star_rating'), which reads the Delta table at the specified path and drops the 'star_rating' column from the resulting DataFrame.

Question 8 of 82

Which of the following operations in Feature Store Client fs can be used to return a Spark DataFrame of a data set associated with a Feature Store table?

fs.create_table

fs.write_table

fs.get_table

There is no way to accomplish this task with fs

fs.read_table

Correct Answer: E

To return a Spark DataFrame of a data set associated with a Feature Store table, the correct operation is fs.read_table. This function retrieves data from the Feature Store and returns it as a Spark DataFrame.

Question 9 of 82

A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:
1. Deploy a model to production and compute predicted values
2. Obtain the observed (actual) label values
3. _____
4. Run a statistical test to determine if there are changes over time
Which of the following should be completed as Step #3?

Obtain the observed values (actual) feature values

Measure the latency of the prediction time

Retrain the model

None of these should be completed as Step #3

Compute the evaluation metric using the observed and predicted values

Correct Answer: E

After obtaining the observed (actual) label values, the next logical step is to compute the evaluation metric using the observed and predicted values. This allows the engineer to compare these values and assess the model's performance. If there are significant changes in the evaluation metric over time, it can indicate potential concept drift. Therefore, computing the evaluation metric is a crucial step before running a statistical test to determine if there are changes over time.

Question 10 of 82

Which of the following is a reason for using Jensen-Shannon (JS) distance over a Kolmogorov-Smirnov (KS) test for numeric feature drift detection?

All of these reasons

JS is not normalized or smoothed

None of these reasons

JS is more robust when working with large datasets

JS does not require any manual threshold or cutoff determinations

Correct Answer: E

Jensen-Shannon (JS) distance produces a value between 0 and 1 that represents the divergence between two distributions. This value can be interpreted directly and doesn't require setting arbitrary thresholds or cutoffs. In contrast, the Kolmogorov-Smirnov (KS) test often involves determining a critical value based on the chosen significance level.