Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 113


You need to analyze user activity data from your company’s mobile applications. Your team will use BigQuery for data analysis, transformation, and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?

Show Answer
Correct Answer: AD

To ensure real-time ingestion of user activity data into BigQuery, it is recommended to configure Pub/Sub and a Dataflow streaming job. Pub/Sub can handle high-volume real-time data streaming, while Dataflow is ideal for processing and transforming data before it reaches BigQuery. This setup is necessary to perform any required data transformations such as masking personally identifiable information (PII) before the data lands in BigQuery. Thus, the combination of Pub/Sub for messaging and Dataflow for data transformation ensures a robust and flexible data ingestion pipeline suitable for analyzing user activity data and experimenting with ML algorithms.

Discussion

11 comments
Sign in to comment
pshemolOption: A
Dec 20, 2022

Previously Google pattern was Pub/Sub -> Dataflow -> BQ but now it looks as there is new Pub/Sub -> BQ https://cloud.google.com/blog/products/data-analytics/pub-sub-launches-direct-path-to-bigquery-for-streaming-analytics

TNT87
Mar 7, 2023

New pub sub??? heheheh

TNT87
Mar 7, 2023

https://cloud.google.com/blog/products/data-analytics/pub-sub-launches-direct-path-to-bigquery-for-streaming-analytics You should have said pub sub has been upgrade to directly stream to bigquery templates...not new pub sub

hiromiOption: A
Dec 21, 2022

A agree with pshemol

M25Option: D
May 9, 2023

Agree with TNT87. From the same link: “For Pub/Sub messages where advanced preload transformations or data processing before landing data in BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow.” It’s “analyze user activity data”, not merely streaming IoT into BigQuery so that concerns like privacy are per se n/a. One can deal with PII after landing in BigQuery as well, but apparently that’s not what they recommend.

andresvelascoOption: A
Sep 10, 2023

I had my doubts between A and D. But since the transformation will occur in bigquery I think Pubsub suffices.

mymy9418Option: D
Dec 18, 2022

need dataflow

mil_spyro
Dec 20, 2022

transformation will be handled in BQ hence I think A

mymy9418
Dec 29, 2022

agree.

TNT87Option: D
Mar 7, 2023

D. Configure Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery. This solution involves using Google Cloud Pub/Sub as the messaging service to receive the data from the mobile application, and then using Google Cloud Dataflow to transform and load the data into BigQuery in real time. Pub/Sub is a scalable and reliable messaging service that can handle high-volume real-time data streaming, while Dataflow provides a unified programming model to develop and run data processing pipelines. This solution is suitable for handling large volumes of user activity data from mobile applications and ingesting it into BigQuery in real-time for analysis and ML experimentation.

TNT87
Mar 7, 2023

Starting today, you no longer have to write or run your own pipelines for data ingestion from Pub/Sub into BigQuery. We are introducing a new type of Pub/Sub subscription called a “BigQuery subscription” that writes directly from Cloud Pub/Sub to BigQuery. This new extract, load, and transform (ELT) path will be able to simplify your event-driven architecture. For Pub/Sub messages where advanced preload transformations or data processing before landing data in BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow

PHD_CHENGOption: D
Mar 30, 2023

Pub/Sub -> DataFlow -> BigQuery

Werner123Option: D
Feb 29, 2024

User data would most likely include PII, for that case it is still recommended to use Dataflow since you need to remove/anonymise sensitive data.

picoOption: D
Nov 16, 2023

I would have added "with / without data transformation" to the question to choose the right answer between A or D

ludovikushOption: D
Mar 1, 2024

Werner123 i agree

PrakzzOption: D
Jun 29, 2024

Need PubSub and Dataflow both for this