Professional Machine Learning Engineer Exam - Question 255

Question

You have recently used TensorFlow to train a classification model on tabular data. You have created a Dataflow pipeline that can transform several terabytes of data into training or prediction datasets consisting of TFRecords. You now need to productionize the model, and you want the predictions to be automatically uploaded to a BigQuery table on a weekly schedule. What should you do?

Examice · Accepted Answer

To productionize the model and ensure that predictions are automatically uploaded to a BigQuery table on a weekly schedule, you should import the model into Vertex AI and deploy it to a Vertex AI endpoint. Additionally, create a Dataflow pipeline that can reuse the data processing logic to send requests to the endpoint and then upload the predictions to a BigQuery table. This approach leverages the existing Dataflow pipeline and ensures efficient processing and integration with BigQuery.

BlehMaks · Answer

The DataflowPythonJobOp operator lets you create a Vertex AI Pipelines component that prepares data by submitting a Python-based Apache Beam job to Dataflow for execution.
https://cloud.google.com/vertex-ai/docs/pipelines/dataflow-component#dataflowpythonjobop
Using we can specify an output location for Vertex AI to store predictions results
https://cloud.google.com/vertex-ai/docs/pipelines/batchprediction-component
A - is incorrect since we dont need an endpoint for batch predictions
B - creating a new Dataflow pipeline is redundant

pertoise · Answer

Answer is C. No need for an endpoint here : Simply specify the BigQuery table URI in the ModelBatchPredictOp parameter and you're done automatically uploading to BigQuery

gscharly · Answer

No need to deploy to endpoint as we need batch predictions. ModelBatchPredictOp can upload data to BQ. Dataflow pipeline logic can be implemented in DataflowPythonJobOp

guilhermebutzke · Answer

My Answer: B

The most complete answer, and reuse a created pipeline. Don’t make sense to use DataflowPythonJobOp when you have already created a dataflow pipeline that does the same.

pinimichele01 · Answer

ModelBatchPredictOp -> upload automatically on BQ
No need for endpoint

--> C

fitri001 · Answer

Option A: Vertex AI Pipelines' ModelBatchPredictOp is designed for batch prediction within pipelines, not for serving models through an endpoint.
Option C: Importing the model directly into BigQuery is not feasible for TensorFlow models.
Option D: Vertex AI Pipelines' BigqueryPredictModelJobOp assumes the model is already trained and hosted in BigQuery ML, which isn't the case here.

rcapj · Answer

B Vertex AI Deployment: Vertex AI provides a managed environment for deploying machine learning models. It simplifies the process and ensures scalability.
Dataflow Pipeline Reuse: Reusing the existing Dataflow pipeline for data processing leverages your existing code and avoids redundant logic.
Model Endpoint Predictions: Sending requests to the deployed model endpoint allows for efficient prediction generation.
BigQuery Upload: Uploading predictions directly to BigQuery from the Dataflow pipeline integrates seamlessly with your data storage.

pikachu007 · Answer

Option A: Vertex AI Pipelines are excellent for orchestrating ML workflows but might not be as efficient as Dataflow for large-scale data processing, especially with existing Dataflow logic.
Option C: While Vertex AI Pipelines can handle model loading and prediction, Dataflow is better suited for large-scale data processing and BigQuery integration.
Option D: BigQuery ML is primarily for in-database model training and prediction, not ideal for external models or large-scale data processing.

daidai75 · Answer

The answer is B, optional A and B doesn't mention how to import prediction result to BigQuery.

tavva_prudhvi · Answer

Not A, C as they does not explicitly mention how the predictions will be uploaded to BigQuery.

pinimichele01 · Answer

agree with BlehMaks

fitri001 · Answer

TFRecords is a specific file format designed by TensorFlow for storing data in a way that's efficient for the machine learning framework. Here are some key points about TFRecords:

Prakzz · Answer

Only option B talks about loading the data to BigQuery

AzureDP900 · Answer

B is right because
1)You've already trained a classification model using TensorFlow, so you need to productionize it by deploying it to a Vertex AI endpoint.
2)To automate the prediction process on a weekly schedule, you can create a Dataflow pipeline that reuses your existing data processing logic. This pipeline will send requests to the deployed model for inference and then upload the predicted results to BigQuery.

Professional Machine Learning Engineer Exam - Question 255

Discussion