Exam Certified Machine Learning Professional All QuestionsBrowse all questions from this exam
Question 2

A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.

Which of the following tools can the machine learning engineer use to assess their theory?

    Correct Answer: C

    To assess if there is a statistically significant association between the time period and the presence of missing values in a particular value of a categorical input variable, the appropriate tool would be the Two-way Chi-squared Test. This test is designed to determine if there is an association between two categorical variables – in this case, the time period (old vs. new data) and the presence or absence of values within a specific category. The one-way Chi-squared test is generally used for goodness-of-fit tests and is not suitable for assessing associations between two categorical variables.

Discussion
ThoBustosOption: B

It seems like we want to determine if there's a statistical difference in one direction. Ex: between last month's data and new data, particularly observing more blanks for a specific categorical variable. It seems like a one-way Chi-squared Test is the most appropriate, what do you say?

Alishahab70Option: C

C. Two-way Chi-squared Test This tool can help the engineer determine if there is a statistically significant association between the time period and the presence of missing values in the categorical variable.

ldoyle3332Option: C

Would the answer not be C? One-Way Chi-Squared Tests will pick up changes in the overall count of null values as a shift in distribution, even if the nulls are distributed evenly among the category values. The question specifies that they are looking for a shift in a particular class value distribution, which would be better for Two-Way Chi-Squared Test

hugodscarvalhoOption: B

Since it's just one categorical input variable over time, the one-way Chi-squared test would be the more appropriate choice.

BokNinjaOption: B

The correct answer is B. One-way Chi-squared Test. The Chi-squared test is a statistical hypothesis test that is used to determine whether there is a significant association between two categorical variables. In this case, the two variables could be the presence (or absence) of a value and the time period (old data vs. new data).