Professional Cloud Architect Exam QuestionsBrowse all questions from this exam

Professional Cloud Architect Exam - Question 119


You need to migrate Hadoop jobs for your company's Data Science team without modifying the underlying infrastructure. You want to minimize costs and infrastructure management effort. What should you do?

Show Answer
Correct Answer: AB

To migrate Hadoop jobs while minimizing costs and infrastructure management effort, the best approach would be to create a Dataproc cluster using preemptible worker instances. Dataproc is a managed service that simplifies the deployment and management of Hadoop clusters, reducing administrative overhead. Preemptible instances are cost-effective as they are cheaper than standard instances. Although they are short-lived and can be terminated by Google Cloud at any time, this cost reduction can be beneficial for non-critical, fault-tolerant data processing tasks where cost minimization is a priority.

Discussion

17 comments
Sign in to comment
TotoroChinaOption: B
Jul 1, 2021

Should be B, you want to minimize costs. https://cloud.google.com/dataproc/docs/concepts/compute/secondary-vms#preemptible_and_non-preemptible_secondary_workers

XDevX
Jul 1, 2021

Hi TotoroChina, I had the same thought when I first read the question - the problem I see is, in real business I think you would try to mix preemtible instances and on-demand instances... Here you have to choose between only preemtible instances and on-demand instances... Preemptible instances have some downsides - so we would need more details and ideally a mixed approach. That's why both answers might be correcy, a) and b)... Do you see that different? Thanks! Cheers, D.

kopper2019
Jul 8, 2021

but you need to reduce management overhead so B if you create a cluster manually and create and maintain GCE is not the way to go

HenkH
Nov 10, 2022

B requires to create new instances at least every 24h.

J19G
Oct 14, 2021

Agree, the migration guide also recommends to think about preemptible worker nodes: https://cloud.google.com/architecture/hadoop/hadoop-gcp-migration-jobs#using_preemptible_worker_nodes

ale_brd_
Nov 24, 2022

I think it's A. The question does not mention anything about minimize the costs, all the questions in GCP exams that require minimize the costs as requirement literally mention that in the question. Also in order to minimize the costs you need to build jobs that are fault tolerant, as workers instances are preemptible. This also requires some kind of Dev investment of work. So if not mentioned in the question fault tolerant and minimize costs then is not required/needed. Doc states below: Only use preemptible nodes for jobs that are fault-tolerant or that are low enough priority that occasional job failure won't disrupt your business.

grejao
Apr 2, 2023

OMG, you again? zetalexg says: It's dissapointing that you waste your time writting on this topic instead of paying attention at the questions.

Sukon_Desknot
Oct 30, 2022

"without modifying the underlying infrastructure" is the watch word. Most likely did not utilize preemptible on-premises

Yogi42
Jan 19, 2023

A cost-savings consideration: Using preemptible VMs does not always save costs since preemptions can cause longer job execution with resulting higher job costs. This is mentioned in above link So I think Ans should be A

firecloudOption: A
Jul 28, 2021

It's A, the primary workers can only be standard, where secondary workers can be preemtible.------In addition to using standard Compute Engine VMs as Dataproc workers (called "primary" workers), Dataproc clusters can use "secondary" workers. There are two types of secondary workers: preemptible and non-preemptible. All secondary workers in your cluster must be of the same type, either preemptible or non-preemptible. The default is preemptible.

Manh
Sep 6, 2021

agreed

CyanideXOption: B
Oct 23, 2023

The Answer should be Spot VMs (This has been changed in actual Exam) Spot VM's have no expiration.

d0094d6Option: B
Feb 2, 2024

You want to minimize costs and infrastructure management effort > B

Pime13Option: B
Feb 2, 2024

minimize costs -> preemtipble

odacirOption: B
Nov 18, 2023

B. - migrate Hadoop jobs -> dataproc - saving money -> preemptible(spot)

discuss24Option: A
Jan 6, 2024

A is the correct response. Per documentation "You can gain low-cost processing power for your jobs by adding preemptible worker nodes to your cluster. These nodes use preemptible virtual machines." The focus of the question is to reduce cost, hence preempttible VM works best

discuss24
Jan 6, 2024

B not A

Romio2023Option: B
Jan 10, 2024

Answer should be B, because minizing the costs is wanted.

didek1986Option: A
Jan 18, 2024

It is A

d0094d6Option: B
Feb 2, 2024

"You want to minimize costs and infrastructure management effort" -> B

Amrita2012Option: A
Feb 18, 2024

Using standard Compute Engine VMs as Dataproc workers (called "primary" workers), Preemptible can be only used for secondary workers hence A is valid answer

VidhyaBupeshOption: A
Feb 27, 2024

Using preemptible VMs does not always save costs since preemptions can cause longer job execution with resulting higher job costs

shashii82Option: B
Mar 10, 2024

Dataproc: Dataproc is a fully managed Apache Spark and Hadoop service on Google Cloud Platform. It allows you to run clusters without the need to manually deploy and manage Hadoop clusters on Compute Engine. Preemptible Worker Instances: Preemptible instances are short-lived, cost-effective virtual machine instances that are suitable for fault-tolerant and batch processing workloads. Since Hadoop jobs can often tolerate interruptions, using preemptible instances can significantly reduce costs. Option B leverages the benefits of Dataproc for managing Hadoop clusters without the need for manual deployment and takes advantage of preemptible instances to minimize costs. This aligns well with the goal of minimizing both costs and infrastructure management efforts.

DiwzOption: B
Mar 27, 2024

Answer is B. The secondary worker type instance for default Dataproc cluster is preemptible VMs. https://cloud.google.com/dataproc/docs/concepts/compute/secondary-vms

dija123Option: B
Apr 18, 2024

Agree with B

Gino17mOption: A
Apr 24, 2024

A - only secondary workers can be preemptible and "Using preemptible VMs does not always save costs since preemptions can cause longer job execution with resulting higher job costs" according to: https://cloud.google.com/dataproc/docs/concepts/compute/secondary-vms#preemptible_and_non-preemptible_secondary_workers

afsarkhanOption: A
Jul 13, 2024

I will go with A , reason preemptible instances are unpredictable and there is no mention of work criticallity. So my answer is A against B