Certified Machine Learning Professional

Here you have the best Databricks Certified Machine Learning Professional practice exam questions

You have 60 total questions to study from
Each page has 5 questions, making a total of 12 pages
You can navigate through the pages using the buttons at the bottom
This questions were last updated on June 29, 2025
This site is not affiliated with or endorsed by Databricks.

Question 1 of 60

Which of the following describes concept drift?

Concept drift is when there is a change in the distribution of an input variable

Concept drift is when there is a change in the distribution of a target variable

Concept drift is when there is a change in the relationship between input variables and target variables

Concept drift is when there is a change in the distribution of the predicted target given by the model

None of these describe Concept drift

Correct Answer: C

Concept drift refers to a change in the relationship between input variables and target variables over time. This change can cause the model's predictions to become less accurate as the original relationship the model learned is no longer valid. It is not merely a shift in the distribution of input or target variables but specifically in how they relate to each other.

Question 2 of 60

A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.

Which of the following tools can the machine learning engineer use to assess their theory?

Kolmogorov-Smirnov (KS) test

One-way Chi-squared Test

Two-way Chi-squared Test

Jenson-Shannon distance

None of these

Correct Answer: C

To assess if there is a statistically significant association between the time period and the presence of missing values in a particular value of a categorical input variable, the appropriate tool would be the Two-way Chi-squared Test. This test is designed to determine if there is an association between two categorical variables – in this case, the time period (old vs. new data) and the presence or absence of values within a specific category. The one-way Chi-squared test is generally used for goodness-of-fit tests and is not suitable for assessing associations between two categorical variables.

Question 3 of 60

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

Exam Certified Machine Learning Professional Question 3

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

Indent the child run blocks within the parent run block

Add the nested=True argument to the parent run

Remove the nested=True argument from the child runs

Provide the same name to the run_name parameter for all three run blocks

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Correct Answer: A

In order to nest the child runs under the parent run, the child run blocks need to be indented within the parent run block. This way, the child runs are executed within the context of the parent run, allowing MLflow to correctly nest the runs.

Question 4 of 60

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.

Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

Exam Certified Machine Learning Professional Question 4

mlflow.log_data(importance_path, "feature-importance.csv")

mlflow.log_artifact(importance_path, "feature-importance.csv")

None of these code blocks tan accomplish the task.

Correct Answer: D

A machine learning engineer who wants to log feature importance data from a CSV file within an MLflow run should use the function mlflow.log_artifact. This function is designed to log files or directories as artifacts of a run, allowing for easy tracking and analysis along with other relevant model information. Therefore, the correct code block to use is mlflow.log_artifact(importance_path, 'feature-importance.csv').

Question 5 of 60

Which of the following is a simple, low-cost method of monitoring numeric feature drift?

Jensen-Shannon test

Summary statistics trends

Chi-squared test

None of these can be used to monitor feature drift

Kolmogorov-Smirnov (KS) test

Correct Answer: B

A simple, low-cost method of monitoring numeric feature drift is to track summary statistics trends. This involves monitoring basic statistical metrics such as mean, median, standard deviation, and other relevant summary statistics over time to detect any shifts in the distribution of the feature values. This approach is straightforward and does not require complex statistical tests, making it both simple and cost-effective.