Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 4


You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

Show Answer
Correct Answer: BD

To optimize the pipeline on Google Cloud using a serverless tool with SQL syntax, you should ingest the data into BigQuery. BigQuery is serverless and supports SQL queries, which allows you to transform the data efficiently and at scale. After performing the transformations using BigQuery SQL queries, you can write the results to a new table. This approach meets both the speed and processing requirements.

Discussion

17 comments
Sign in to comment
nunzio144Option: D
Jul 22, 2021

It should be D .... Data Fusion is not SQL syntax ....

q4exam
Sep 8, 2021

Agree, BQ is the only serverless that support SQL

A4M
Jan 20, 2022

Needs to be D as the most suitable answer given the req's in question Datafusion is more of a no code Data transformation tool

Celia20210714Option: A
Jul 19, 2021

ANS: A https://cloud.google.com/data-fusion#section-1 - Data Fusion is a serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership. - BigQuery is serverless and supports SQL. - Dataproc is not serverless, you have to manage clusters. - Cloud SQL is not serverless, you have to manage instances.

q4exam
Sep 8, 2021

Data Fusion is not serverless, it create dataproc to execute the job .... I think the answer is C

mousseUwU
Oct 18, 2021

Data Fusion is serverless: https://cloud.google.com/data-fusion#all-features

tavva_prudhvi
Mar 6, 2023

I think you're only viewing the sentence "A serverless approach leveraging the scalability and reliability of Google services like Dataproc means Data Fusion offers the best of data integration capabilities with a lower total cost of ownership", The sentence implies that Data Fusion leverages a serverless approach, but it does not explicitly state that Data Fusion itself is serverless. It states that Data Fusion offers the best of data integration capabilities by using a serverless approach that leverages the scalability and reliability of Google services like Dataproc. So, while Data Fusion may not be fully serverless, it is designed to take advantage of serverless capabilities through its integration with Google services.

mousseUwU
Oct 18, 2021

Agree, A is correct

alejo_1053Option: B
Aug 2, 2022

I was thinking B, but now I'm kind of confused that nobody voted it

EFIGOOption: D
Nov 23, 2022

Data Fusion is not in SQL syntax, so no A; Dataproc is not serverless, so no B; Passing through Cloud SQL is uselss, just go with BigQuery, so no C; D is correct

M25Option: D
May 9, 2023

Went with D

asavaOption: B
Mar 14, 2023

BQ is the serverless solution

GCP72Option: D
Aug 15, 2022

Correct answer is "D"

abhi0706Option: D
Oct 31, 2022

C,D booth can be implemented as will work but D is faster for implementation

ares81Option: A
Jan 6, 2023

It should be A.

ssaporyloOption: D
Jan 11, 2023

Vote D

mellowedOption: D
Jan 14, 2023

Correct option is D

12112Option: D
Jul 8, 2023

I'll go with D.

Sum_SumOption: D
Nov 14, 2023

D - as BQ is server less and supports SQL none of the other options match both criteria

fragkrisOption: D
Dec 1, 2023

D - BigQuery is the only serverless and SQL-syntax option.

PhilipKokuOption: D
Jun 5, 2024

The best approach is option D: Ingest data into BigQuery and use SQL queries for transformations. This leverages BigQuery’s serverless capabilities, efficient processing, and seamless integration with other Google Cloud services.

YorkoOption: D
Jul 8, 2024

There's an updated version of this question in the official Google Cloud certified PMLE study guide. Option D is marked as correct

tadeupanOption: D
Jul 16, 2024

option D because needs a serveless solution and sql sintax and BigQuery offer this. Datarproc is not serverless, so B is incorrect, D is correct option.