Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 261


You are developing a recommendation engine for an online clothing store. The historical customer transaction data is stored in BigQuery and Cloud Storage. You need to perform exploratory data analysis (EDA), preprocessing and model training. You plan to rerun these EDA, preprocessing, and training steps as you experiment with different types of algorithms. You want to minimize the cost and development effort of running these steps as you experiment. How should you configure the environment?

Show Answer
Correct Answer: BC

To minimize cost and development effort while performing EDA, preprocessing, and model training tasks, using a Vertex AI Workbench managed notebook is the optimal choice. Managed notebooks provide a user-friendly JupyterLab interface that integrates seamlessly with BigQuery and Cloud Storage, allowing for direct browsing and querying of tables without additional overhead. This setup reduces the need for managing VM instances or additional connectors, streamlining the workflow and saving both time and resources.

Discussion

8 comments
Sign in to comment
b1a8faeOption: B
Jan 22, 2024

"Managed notebooks are usually a good choice if you want to use a notebook for data exploration, analysis, modeling, or as part of an end-to-end data science workflow. Managed notebooks instances let you perform workflow-oriented tasks without leaving the JupyterLab interface. They also have many integrations and features for implementing your data science workflow." vs. "User-managed notebooks can be a good choice for users who require extensive customization or who need a lot of control over their environment." Seems more like the former -> B

AzureDP900Option: B
Jul 5, 2024

B is right because this option allows you to minimize cost and development effort by using a managed notebook in Vertex AI Workbench, which integrates well with BigQuery and Cloud Storage. You can browse and query your data directly within the JupyterLab interface without having to create a separate BigQuery client or use the bq command-line tool.

pikachu007Option: B
Jan 13, 2024

Option A: User-managed notebooks require VM instance management, adding cost and complexity. %%bigquery magic commands are still needed. Option C: Dataproc Hub adds unnecessary cost and complexity for simple BigQuery interactions. Option D: Spark-bigquery-connector adds complexity and overhead compared to the native BigQuery integration in managed notebooks.

shadz10Option: B
Jan 18, 2024

https://cloud.google.com/vertex-ai/docs/workbench/notebook-solution#:~:text=For%20users%20who%20have%20specific,user%2Dmanaged%20notebooks%20instance's%20VM.

daidai75Option: B
Jan 23, 2024

https://cloud.google.com/bigquery/docs/visualize-jupyter

guilhermebutzkeOption: A
Feb 16, 2024

My Answer: A A: Default VM instance is the best to minimize the cost, and the command %%bigquery magic is the most easy way to get data from BQ. B: Not necessary JupyerLab interface to run code. The %%bigquerv magic commands is sufficient to get data and run easily queries. C: Dataproc Hub seems overkill and it is more expensive than a default VM instance. C: spark-bigquery-connector unnecessary to get tables in the notebook. better use %%bigquery.

gscharlyOption: A
Apr 20, 2024

agree with guilhermebutzke. Also, this option is easier to reuse in multiple experiments

pinimichele01Option: B
Apr 21, 2024

see b1a8fae