Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 171


You work for a large real estate firm and are preparing 6 TB of home sales data to be used for machine learning. You will use SQL to transform the data and use

BigQuery ML to create a machine learning model. You plan to use the model for predictions against a raw dataset that has not been transformed. How should you set up your workflow in order to prevent skew at prediction time?

Show Answer
Correct Answer: AB

To prevent skew during prediction time, the preprocessing steps defined during model creation must also be applied at prediction time. Using BigQuery's TRANSFORM clause to define preprocessing steps ensures that the same transformations are consistently applied during both training and prediction phases. This approach maintains data consistency and prevents skew by automatically applying the specified preprocessing during predictions without requiring additional transformations on the raw input data.

Discussion

10 comments
Sign in to comment
AWSandeepOption: A
Sep 2, 2022

A. When creating your model, use BigQuery's TRANSFORM clause to define preprocessing steps. At prediction time, use BigQuery's ML.EVALUATE clause without specifying any transformations on the raw input data. Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning. Reference: https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform

zellckOption: A
Nov 29, 2022

A is the answer. https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is automatically applied during the prediction and evaluation phases of machine learning

TNT87Option: A
Sep 9, 2022

https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform Ans A

jkhongOption: A
Dec 16, 2022

Problem: Skew One thing that I overlooked when answering previously is that B, C does not address skew. When we preprocess our training data, we need to save our scaled factors somewhere, and when performing predictions on our test data, we need to use the scaling factors of our training data to predict the results. ML.EVALUATE already incorporates preprocessing steps for our test data using the saved scaled factors.

Prudvi3266Option: A
Apr 21, 2023

A is correct answer if we use TRANSFORM clause in BigQuery no need to use any transform while evaluating and predicting https://cloud.google.com/bigquery/docs/bigqueryml-transform

duccOption: A
Sep 3, 2022

This query's nested SELECT statement and FROM clause are the same as those in the CREATE MODEL query. Because the TRANSFORM clause is used in training, you don't need to specify the specific columns and transformations. They are automatically restored. Reference: https://cloud.google.com/bigquery-ml/docs/bigqueryml-transform

Kvk117Option: A
Jan 20, 2023

A is the correct answer

GCPSharonOption: C
Oct 27, 2022

Stew prediction time by remove the preprocessing!

Matt_108Option: A
Jan 13, 2024

Option A

LenifiaOption: B
Jul 5, 2024

The key to preventing skew in machine learning models is to ensure that the same data preprocessing steps are applied consistently to both the training data and the prediction data. In option B, the TRANSFORM clause in BigQuery ML is used to define preprocessing steps during model creation, and a saved query is used to apply the same transformations to the raw input data before making predictions. This ensures consistency and prevents skew. The ML.EVALUATE function is then used to evaluate the model’s performance on the transformed prediction data. This is the recommended workflow