Certified Machine Learning Professional

Here you have the best Databricks Certified Machine Learning Professional practice exam questions

  • You have 60 total questions to study from
  • Each page has 5 questions, making a total of 12 pages
  • You can navigate through the pages using the buttons at the bottom
  • This questions were last updated on November 4, 2024
Question 1 of 60

Which of the following describes concept drift?

    Correct Answer: C

    Concept drift refers to a change in the relationship between input variables and target variables over time. This change can cause the model's predictions to become less accurate as the original relationship the model learned is no longer valid. It is not merely a shift in the distribution of input or target variables but specifically in how they relate to each other.

Question 2 of 60

A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.

Which of the following tools can the machine learning engineer use to assess their theory?

    Correct Answer: C

    To assess if there is a statistically significant association between the time period and the presence of missing values in a particular value of a categorical input variable, the appropriate tool would be the Two-way Chi-squared Test. This test is designed to determine if there is an association between two categorical variables – in this case, the time period (old vs. new data) and the presence or absence of values within a specific category. The one-way Chi-squared test is generally used for goodness-of-fit tests and is not suitable for assessing associations between two categorical variables.

Question 3 of 60

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

    Correct Answer: A

    In order to nest the child runs under the parent run, the child run blocks need to be indented within the parent run block. This way, the child runs are executed within the context of the parent run, allowing MLflow to correctly nest the runs.

Question 4 of 60

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.

Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

    Correct Answer: D

    A machine learning engineer who wants to log feature importance data from a CSV file within an MLflow run should use the function mlflow.log_artifact. This function is designed to log files or directories as artifacts of a run, allowing for easy tracking and analysis along with other relevant model information. Therefore, the correct code block to use is mlflow.log_artifact(importance_path, 'feature-importance.csv').

Question 5 of 60

Which of the following is a simple, low-cost method of monitoring numeric feature drift?

    Correct Answer: B

    A simple, low-cost method of monitoring numeric feature drift is to track summary statistics trends. This involves monitoring basic statistical metrics such as mean, median, standard deviation, and other relevant summary statistics over time to detect any shifts in the distribution of the feature values. This approach is straightforward and does not require complex statistical tests, making it both simple and cost-effective.