Certified Machine Learning Professional Exam QuestionsBrowse all questions from this exam

Certified Machine Learning Professional Exam - Question 2


A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.

Which of the following tools can the machine learning engineer use to assess their theory?

Show Answer
Correct Answer: BC

To assess if there is a statistically significant association between the time period and the presence of missing values in a particular value of a categorical input variable, the appropriate tool would be the Two-way Chi-squared Test. This test is designed to determine if there is an association between two categorical variables – in this case, the time period (old vs. new data) and the presence or absence of values within a specific category. The one-way Chi-squared test is generally used for goodness-of-fit tests and is not suitable for assessing associations between two categorical variables.

Discussion

5 comments
Sign in to comment
BokNinjaOption: B
Dec 19, 2023

The correct answer is B. One-way Chi-squared Test. The Chi-squared test is a statistical hypothesis test that is used to determine whether there is a significant association between two categorical variables. In this case, the two variables could be the presence (or absence) of a value and the time period (old data vs. new data).

hugodscarvalhoOption: B
Jan 27, 2024

Since it's just one categorical input variable over time, the one-way Chi-squared test would be the more appropriate choice.

ldoyle3332Option: C
Jan 28, 2024

Would the answer not be C? One-Way Chi-Squared Tests will pick up changes in the overall count of null values as a shift in distribution, even if the nulls are distributed evenly among the category values. The question specifies that they are looking for a shift in a particular class value distribution, which would be better for Two-Way Chi-Squared Test

Alishahab70Option: C
Feb 14, 2024

C. Two-way Chi-squared Test This tool can help the engineer determine if there is a statistically significant association between the time period and the presence of missing values in the categorical variable.

ThoBustosOption: B
May 7, 2024

It seems like we want to determine if there's a statistical difference in one direction. Ex: between last month's data and new data, particularly observing more blanks for a specific categorical variable. It seems like a one-way Chi-squared Test is the most appropriate, what do you say?