Exam Professional Data Engineer All QuestionsBrowse all questions from this exam
Question 135

You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:

✑ Decoupling producer from consumer

✑ Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely

✑ Near real-time SQL query

✑ Maintain at least 2 years of historical data, which will be queried with SQL

Which pipeline should you use to meet these requirements?

    Correct Answer: D

    To meet the requirements of decoupling the producer from the consumer, space and cost-efficient storage, near real-time SQL query capabilities, and maintaining at least 2 years of historical data, the optimal solution is to create an application that publishes events to Cloud Pub/Sub. This setup ensures decoupling. Additionally, using Cloud Dataflow to transform the JSON event payloads to Avro format and writing the data to both Cloud Storage and BigQuery addresses the need for efficient storage and real-time querying. Data stored in Avro format is more space-efficient than JSON, and Cloud Storage offers cost-effective long-term storage. BigQuery facilitates efficient and scalable SQL querying of both real-time and historical data.

Discussion
[Removed]Option: D

Correct - D

[Removed]Option: D

Answer: D Description: All the requirements meet with D

MaxNRGOption: D

D: Cloud Pub/Sub, Cloud Dataflow, Cloud Storage, BigQuery https://cloud.google.com/solutions/stream-analytics/

barnac1esOption: D

Here's how this option aligns with your requirements: Decoupling Producer from Consumer: Cloud Pub/Sub provides a decoupled messaging system where the producer publishes events, and consumers (like Dataflow) can subscribe to these events. This decoupling ensures flexibility and scalability. Space and Cost-Efficient Storage: Storing data in Avro format is more space-efficient than JSON, and Cloud Storage is a cost-effective storage solution. Additionally, Cloud Pub/Sub and Dataflow allow you to process and transform data efficiently, reducing storage costs. Near Real-time SQL Query: By using Dataflow to transform and load data into BigQuery, you can achieve near real-time data availability for SQL queries. BigQuery is well-suited for ad-hoc SQL queries and provides excellent query performance.

juliorevkOption: D

D because pub/sub decouples while dataflow processes; Cloud Storage can be used to store the raw ingested data indefinitely and BQ can be used to query.

Prasanna_kumarOption: D

Answer is D

edreOption: D

Google recommended approach

FP77Option: D

Should be D

vaga1Option: D

For sure D

forepickOption: D

D is the most suitable, however the stored format should be JSON, and AVRO isn't JSON...

OberstKOption: D

Correct - D

desertlotus1211

I believe this was also on the GCP PCA exam as well! ;)

AzureDP900Option: D

D. Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.

zellckOption: D

D is the answer.

mbacelarOption: D

For sure D

clouditisOption: D

D it is!

medeis_jarOption: D

OMG only D