Professional Machine Learning Engineer Exam - Question 261

Question

You are developing a recommendation engine for an online clothing store. The historical customer transaction data is stored in BigQuery and Cloud Storage. You need to perform exploratory data analysis (EDA), preprocessing and model training. You plan to rerun these EDA, preprocessing, and training steps as you experiment with different types of algorithms. You want to minimize the cost and development effort of running these steps as you experiment. How should you configure the environment?

Examice · Accepted Answer

To minimize cost and development effort while performing EDA, preprocessing, and model training tasks, using a Vertex AI Workbench managed notebook is the optimal choice. Managed notebooks provide a user-friendly JupyterLab interface that integrates seamlessly with BigQuery and Cloud Storage, allowing for direct browsing and querying of tables without additional overhead. This setup reduces the need for managing VM instances or additional connectors, streamlining the workflow and saving both time and resources.

b1a8fae · Answer

"Managed notebooks are usually a good choice if you want to use a notebook for data exploration, analysis, modeling, or as part of an end-to-end data science workflow.

Managed notebooks instances let you perform workflow-oriented tasks without leaving the JupyterLab interface. They also have many integrations and features for implementing your data science workflow."

vs.

"User-managed notebooks can be a good choice for users who require extensive customization or who need a lot of control over their environment."

Seems more like the former -> B

AzureDP900 · Answer

B is right because this option allows you to minimize cost and development effort by using a managed notebook in Vertex AI Workbench, which integrates well with BigQuery and Cloud Storage. You can browse and query your data directly within the JupyterLab interface without having to create a separate BigQuery client or use the bq command-line tool.

pikachu007 · Answer

Option A: User-managed notebooks require VM instance management, adding cost and complexity. %%bigquery magic commands are still needed.
Option C: Dataproc Hub adds unnecessary cost and complexity for simple BigQuery interactions.
Option D: Spark-bigquery-connector adds complexity and overhead compared to the native BigQuery integration in managed notebooks.

shadz10 · Answer

https://cloud.google.com/vertex-ai/docs/workbench/notebook-solution#:~:text=For%20users%20who%20have%20specific,user%2Dmanaged%20notebooks%20instance's%20VM.

daidai75 · Answer

https://cloud.google.com/bigquery/docs/visualize-jupyter

guilhermebutzke · Answer

My Answer: A

A: Default VM instance is the best to minimize the cost, and the command %%bigquery magic is the most easy way to get data from BQ.

B: Not necessary JupyerLab interface to run code. The %%bigquerv magic commands is sufficient to get data and run easily queries.

C: Dataproc Hub seems overkill and it is more expensive than a default VM instance.

C:  spark-bigquery-connector unnecessary to get tables in the notebook. better use  %%bigquery.

gscharly · Answer

agree with guilhermebutzke. Also, this option is easier to reuse in multiple experiments

pinimichele01 · Answer

see b1a8fae

Professional Machine Learning Engineer Exam - Question 261

Discussion