Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 135


You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:

✑ Decoupling producer from consumer

✑ Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely

✑ Near real-time SQL query

✑ Maintain at least 2 years of historical data, which will be queried with SQL

Which pipeline should you use to meet these requirements?

Show Answer
Correct Answer: D

To meet the requirements of decoupling the producer from the consumer, space and cost-efficient storage, near real-time SQL query capabilities, and maintaining at least 2 years of historical data, the optimal solution is to create an application that publishes events to Cloud Pub/Sub. This setup ensures decoupling. Additionally, using Cloud Dataflow to transform the JSON event payloads to Avro format and writing the data to both Cloud Storage and BigQuery addresses the need for efficient storage and real-time querying. Data stored in Avro format is more space-efficient than JSON, and Cloud Storage offers cost-effective long-term storage. BigQuery facilitates efficient and scalable SQL querying of both real-time and historical data.

Discussion

17 comments
Sign in to comment
[Removed]Option: D
Mar 22, 2020

Correct - D

[Removed]Option: D
Mar 28, 2020

Answer: D Description: All the requirements meet with D

MaxNRGOption: D
Jan 9, 2022

D: Cloud Pub/Sub, Cloud Dataflow, Cloud Storage, BigQuery https://cloud.google.com/solutions/stream-analytics/

barnac1esOption: D
Sep 24, 2023

Here's how this option aligns with your requirements: Decoupling Producer from Consumer: Cloud Pub/Sub provides a decoupled messaging system where the producer publishes events, and consumers (like Dataflow) can subscribe to these events. This decoupling ensures flexibility and scalability. Space and Cost-Efficient Storage: Storing data in Avro format is more space-efficient than JSON, and Cloud Storage is a cost-effective storage solution. Additionally, Cloud Pub/Sub and Dataflow allow you to process and transform data efficiently, reducing storage costs. Near Real-time SQL Query: By using Dataflow to transform and load data into BigQuery, you can achieve near real-time data availability for SQL queries. BigQuery is well-suited for ad-hoc SQL queries and provides excellent query performance.

Prasanna_kumarOption: D
Feb 21, 2022

Answer is D

juliorevkOption: D
Sep 24, 2023

D because pub/sub decouples while dataflow processes; Cloud Storage can be used to store the raw ingested data indefinitely and BQ can be used to query.

medeis_jarOption: D
Jan 8, 2022

OMG only D

clouditisOption: D
Sep 22, 2022

D it is!

mbacelarOption: D
Nov 13, 2022

For sure D

zellckOption: D
Dec 2, 2022

D is the answer.

AzureDP900Option: D
Dec 31, 2022

D. Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.

desertlotus1211
Jan 24, 2023

I believe this was also on the GCP PCA exam as well! ;)

OberstKOption: D
Feb 3, 2023

Correct - D

forepickOption: D
Jun 1, 2023

D is the most suitable, however the stored format should be JSON, and AVRO isn't JSON...

vaga1Option: D
Jun 8, 2023

For sure D

FP77Option: D
Aug 16, 2023

Should be D

edreOption: D
Jul 22, 2024

Google recommended approach