Professional Data Engineer Exam - Question 171

Question

You work for a large real estate firm and are preparing 6 TB of home sales data to be used for machine learning. You will use SQL to transform the data and use

BigQuery ML to create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

Examice · Accepted Answer

To prevent skew during prediction time, the preprocessing steps defined during model creation must also be applied at prediction time. Using BigQuery's TRANSFORM clause to define preprocessing steps ensures that the same transformations are consistently applied during both training and prediction phases. This approach maintains data consistency and prevents skew by automatically applying the specified preprocessing during predictions without requiring additional transformations on the raw input data.

AWSandeep · Answer

A. When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps. At prediction time, use BigQuery's ML.EVALUATE clause without specifying any transformations on the raw input data.

Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning.

Reference: https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform

zellck · Answer

A is the answer.

https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform
Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning

TNT87 · Answer

https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform
Ans A

jkhong · Answer

Problem: Skew

One thing that I overlooked when answering previously is that B, C does not address skew. When we preprocess our training data, we need to save our scaled factors somewhere, and when performing predictions on our test data, we need to use the scaling factors of our training data to predict the results.

ML.EVALUATE already incorporates preprocessing steps for our test data using the saved scaled factors.

Prudvi3266 · Answer

A is correct answer if we use TRANSFORM clause in BigQuery no need to use any transform while evaluating and predicting https://cloud.google.com/bigquery/docs/bigqueryml-transform

ducc · Answer

This query's nested SELECT statement and FROM clause are the same as those in the CREATE MODEL query. Because the TRANSFORM clause is used in training, you don't need to specify the specific columns and transformations. They are automatically restored.

Reference: https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform

Kvk117 · Answer

A is the correct answer

GCPSharon · Answer

Stew prediction time by remove the preprocessing!

Matt_108 · Answer

Option A

Lenifia · Answer

The key to preventing skew in machine learning models is to ensure that the same data preprocessing steps are applied consistently to both the training data and the prediction data. In option B, the TRANSFORM clause in BigQuery ML is used to define preprocessing steps during model creation, and a saved query is used to apply the same transformations to the raw input data before making predictions. This ensures consistency and prevents skew. The ML.EVALUATE function is then used to evaluate the model’s performance on the transformed prediction data. This is the recommended workflow

Professional Data Engineer Exam - Question 171

Discussion