Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 167


You are working with a dataset that contains customer transactions. You need to build an ML model to predict customer purchase behavior. You plan to develop the model in BigQuery ML, and export it to Cloud Storage for online prediction. You notice that the input data contains a few categorical features, including product category and payment method. You want to deploy the model as quickly as possible. What should you do?

Show Answer
Correct Answer: A

To deploy the model as quickly as possible using BigQuery ML, the correct approach is to utilize the TRANSFORM clause with the ML.ONE_HOT_ENCODER function on the categorical features at model creation. This allows the model to directly handle the one-hot encoding of categorical features during training and explicitly includes both categorical and non-categorical features in the process. By embedding the transformation within the model creation, this method ensures efficiency and simplicity.

Discussion

4 comments
Sign in to comment
BlehMaksOption: B
Jan 12, 2024

When the TRANSFORM clause is present, only output columns from the TRANSFORM clause are used in training. Any results from query_statement that don't appear in the TRANSFORM clause are ignored. https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform so if you want TRANSFORM then use TRANSFORM for both categorical and non-categorical features

b1a8faeOption: B
Jan 8, 2024

Only B and D make sense. Between the two, after reading the use case of multi-hot encoding (https://cloud.google.com/bigquery/docs/auto-preprocessing#feature-transform), I would tend towards B, since one-hot encoding is preferred over in case of using non-numerical, non-array features (product category and payment methods are often respresented as such); multi-hot encoding is preferred in case of non-numerical, array features, which is not the case here.

b1a8fae
Jan 8, 2024

Also I understand it cannot be A because it says "take the categorical features" as opposed to the more specific "take the encoded categorical features" in B

pikachu007Option: B
Jan 10, 2024

Given the goal of quickly deploying the model for predicting customer purchase behavior while handling categorical features, option B - "Use the ML.ONE_HOT_ENCODER function on the categorical features and select the encoded categorical features and non-categorical features as inputs to create your model" seems to be the most appropriate. This approach directly handles the encoding of categorical features using one-hot encoding and selects the necessary features for model creation, ensuring efficient utilization of categorical data in the BigQuery ML model.

bobjrOption: A
Jun 6, 2024

CREATE OR REPLACE MODEL `project.dataset.model_name` OPTIONS(model_type='logistic_reg') AS SELECT *, TRANSFORM( product_category, payment_method USING ML.ONE_HOT_ENCODER(product_category) AS encoded_product_category, ML.ONE_HOT_ENCODER(payment_method) AS encoded_payment_method ) FROM `project.dataset.table_name`;