Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 202


You work for a startup that has multiple data science workloads. Your compute infrastructure is currently on-premises, and the data science workloads are native to PySpark. Your team plans to migrate their data science workloads to Google Cloud. You need to build a proof of concept to migrate one data science job to Google Cloud. You want to propose a migration process that requires minimal cost and effort. What should you do first?

Show Answer
Correct Answer: CD

For migrating a data science job specifically native to PySpark to Google Cloud with minimal cost and effort, creating a Vertex AI Workbench notebook with an instance type like n2-standard-4 is a suitable approach. Vertex AI Workbench provides pre-configured environments including libraries like PySpark, reducing the need for manual setup and maintenance. This approach allows you to focus on running your data science job without managing additional infrastructure, making it ideal for a proof of concept.

Discussion

12 comments
Sign in to comment
guilhermebutzkeOption: C
Feb 14, 2024

My answer: C C: This option leverages Google Cloud's Dataproc service, which is designed for running Apache Spark and other big data processing frameworks. By creating a Standard Dataproc cluster, you can easily scale resources as needed for your workload. A. n2-standard-4 VM: This requires manual setup and ongoing maintenance, increasing cost and effort. B. GKE cluster: While offering containerization benefits, it necessitates managing containers and Spark configurations, adding complexity. D. With Vertex AI Workbench, your team can develop, train, and deploy machine learning models using popular frameworks like TensorFlow, PyTorch, and scikit-learn. However, while Vertex AI Workbench supports PySpark, it may not be the optimal choice for migrating existing PySpark workloads, as it's primarily focused on machine learning tasks.

Carlose2108
Feb 27, 2024

You're right but I have a doubt about in a part of Option D "You need to build a proof of concept to migrate one data science job to Google Cloud"

Carlose2108Option: C
Mar 2, 2024

I went with C. For Proof Of Concept and requires minimal cost and effort. Furthermore, Vertex AI Workbench notebooks come pre-configured with PySpark.

Yan_XOption: D
Mar 11, 2024

D Can use Notebook pre-installed libraries and tools, including PySpark.

gscharlyOption: D
Apr 17, 2024

went with D: https://cloud.google.com/vertex-ai/docs/workbench/instances/create-dataproc-enabled

pinimichele01
Apr 20, 2024

https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview

pikachu007Option: D
Jan 13, 2024

Minimal setup: Vertex AI Workbench notebooks come pre-configured with PySpark and other data science tools, eliminating the need for manual installation and setup. Cost-effectiveness: Vertex AI Workbench offers managed notebooks with pay-as-you-go pricing, making it a cost-efficient option for proof-of-concept testing. Ease of use: Data scientists can directly run PySpark code in the notebook without managing infrastructure, streamlining the migration process. Scalability: Vertex AI Workbench can easily scale to handle larger workloads or multiple users if the proof-of-concept is successful.

BlehMaksOption: C
Jan 15, 2024

https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview

shadz10Option: D
Jan 17, 2024

https://cloud.google.com/vertex-ai-notebooks?hl=en Data Data Lake and Spark in one place Whether you use TensorFlow, PyTorch, or Spark, you can run any engine from Vertex AI Workbench. D is correct

ddoggOption: C
Feb 2, 2024

Agree with BlehMaks https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview Dataproc cluster seems more suitable

Carlose2108Option: D
Mar 2, 2024

My bad, I mean is Option D.

pinimichele01Option: C
Apr 8, 2024

When you want to move your Apache Spark workloads from an on-premises environment to Google Cloud, we recommend using Dataproc to run Apache Spark/Apache Hadoop clusters. https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview

fitri001Option: D
Apr 19, 2024

Vertex AI Workbench notebook: This option provides a pre-configured environment with popular data science libraries like PySpark already installed. It allows you to focus on migrating your PySpark code with minimal changes. n2-standard-4 instance type: This is a general-purpose machine type suitable for various data science tasks. It offers a good balance between cost and performance for initial exploration.

fitri001
Apr 19, 2024

A. Create a n2-standard-4 VM instance: This option requires manually installing Java, Scala, and Spark dependencies, which is time-consuming and prone to errors. It also involves managing the VM instance lifecycle, increasing complexity. B. Create a Google Kubernetes Engine cluster: Setting up and managing a Kubernetes cluster for a single job is overkill for a proof of concept. It adds unnecessary complexity and cost. C. Create a Standard Dataproc cluster: While Dataproc is a managed Spark environment on GCP, setting up a full cluster (master and workers) might be more resource-intensive than needed for a single job, especially for a proof of concept.

pinimichele01
Apr 20, 2024

https://cloud.google.com/architecture/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc#overview why not c?

TanTran04Option: C
Jul 8, 2024

I'm following option C. Please take a look the concept of 'Dataproc documentation' (ref: https://cloud.google.com/dataproc/docs) With option D: doesn't provide a solution for managing and scaling the Spark environment, which is necessary for running PySpark workloads.