Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 282


You are using a Dataflow streaming job to read messages from a message bus that does not support exactly-once delivery. Your job then applies some transformations, and loads the result into BigQuery. You want to ensure that your data is being streamed into BigQuery with exactly-once delivery semantics. You expect your ingestion throughput into BigQuery to be about 1.5 GB per second. What should you do?

Show Answer
Correct Answer: A

To achieve exactly-once delivery semantics while ingesting data into BigQuery, the BigQuery Storage Write API should be used. This API is explicitly designed for high-throughput and low-latency data ingestion, including tools to prevent data duplication which is crucial for exactly-once delivery. A regional target BigQuery table is recommended as it can provide better performance and lower latency if the Dataflow job is in the same region, which aligns well with the expected ingestion throughput of 1.5 GB per second.

Discussion

19 comments
Sign in to comment
AlizCertOption: B
Jun 5, 2024

It should B, Storage Write API has "3 GB per second throughput in multi-regions; 300 MB per second in regions"

rajshiv
Apr 13, 2025

B is incorrect. Multiregional tables are not supported by the Storage Write API for exactly-once delivery. This option is invalid.

raaadOption: A
Jan 10, 2024

- BigQuery Storage Write API: This API is designed for high-throughput, low-latency writing of data into BigQuery. It also provides tools to prevent data duplication, which is essential for exactly-once delivery semantics. - Regional Table: Choosing a regional location for the BigQuery table could potentially provide better performance and lower latency, as it would be closer to the Dataflow job if they are in the same region.

AllenChen123
Jan 25, 2024

Agree. https://cloud.google.com/bigquery/docs/write-api#advantages

SamuelTschOption: B
Nov 1, 2024

looking for this documentation https://cloud.google.com/bigquery/quotas#write-api-limits. 3 GB/s in multi-regions; 300MB/s in regions

SiaharaOption: A
Feb 6, 2025

A. Implement the BigQuery Storage Write API and guarantee that the target BigQuery table is regional. Here's the breakdown: Why Option A is Superior Exactly-Once Delivery: The BigQuery Storage Write API intrinsically supports exactly-once delivery using stream offsets. This guarantees that each message is written to BigQuery exactly one time, even in the case of retries due to the lack of native exactly-once support in your message bus. High Throughput: The Storage Write API is optimized for high-throughput scenarios. It can handle the expected ingestion throughput of 1.5 GB per second. Regional Tables: Using a regional BigQuery table aligns with best practices when utilizing the Storage Write API, as it helps to minimize latency and reduce potential cross-region communication costs.

gord_nat
Mar 27, 2025

Has to be multi-regional (B) Max throughput for regional currently only 300 MB/s https://cloud.google.com/bigquery/quotas

Ed_KimOption: A
Jan 3, 2024

Voting on A

Smakyel79
Jan 7, 2024

This option leverages the BigQuery Storage Write API's capability for exactly-once delivery semantics and a regional table setting that can meet compliance and data locality needs without impacting the delivery semantics. The BigQuery Storage Write API is more suitable for your high-throughput requirements compared to the BigQuery Streaming API.

HermanTan
Sep 30, 2024

To ensure that analysts do not see customer data older than 30 days while minimizing cost and overhead, the best option is: B. Use a timestamp range filter in the query to fetch the customer’s data for a specific range. This approach directly addresses the issue by filtering out data older than 30 days at query time, ensuring that only the relevant data is retrieved. It avoids the overhead and potential delays associated with garbage collection and manual deletion processes

CloudAdrMXOption: B
Nov 28, 2024

According to this documentation, its B https://cloud.google.com/bigquery/quotas#write-api-limits

NatyNogasOption: A
Dec 1, 2024

- Choosing a regional target BigQuery table ensures that data is stored redundantly in a single region, providing high availability and durability.

m_a_p_sOption: B
Dec 12, 2024

streamed into BigQuery with exactly-once delivery semantics >>> Storage Write API ingestion throughput into BigQuery to be about 1.5 GB per second >>> multiregional (check throughput rate here >>> https://cloud.google.com/bigquery/quotas#write-api-limits)

himadri1983Option: B
Dec 14, 2024

3 GB per second throughput in multi-regions; 300 MB per second in regions https://cloud.google.com/bigquery/quotas#write-api-limits

Pime13Option: A
Jan 6, 2025

https://cloud.google.com/bigquery/docs/streaming-data-into-bigquery For new projects, we recommend using the BigQuery Storage Write API instead of the tabledata.insertAll method. The Storage Write API has lower pricing and more robust features, including exactly-once delivery semantics https://cloud.google.com/bigquery/docs/write-api#advantages

Matt_108Option: A
Jan 13, 2024

Option A

hanoverquayOption: D
Mar 15, 2024

option D

BennyXu
Apr 7, 2024

you are wrong!!!!!!!!!!!!

imazyOption: A
Nov 10, 2024

Write API support 2.5 GB / sec speed and support exactly-once delivery semantics https://cloud.google.com/bigquery/docs/write-api#connections whereas in streaming duplicates can come and needed to remove them manually https://cloud.google.com/bigquery/docs/streaming-data-into-bigquery#dataavailability

hussain.sainOption: B
Dec 27, 2024

B is correct. When aiming for exactly-once delivery in a Dataflow streaming job, the key is to use the BigQuery Storage Write API, as it provides the capability to handle large-scale data ingestion with the correct semantics, including exactly-once delivery.

juliorevkOption: B
Jan 31, 2025

- BigQuery Storage Write API: This API is designed for high-throughput, low-latency writing of data into BigQuery. It also provides tools to prevent data duplication, which is essential for exactly-once delivery semantics. - The multiregional table ensures that your data is highly available and can be streamed into BigQuery across multiple regions. It is better suited for high-throughput and low-latency workloads, as it provides distributed write capabilities that can handle large data volumes, such as the 1.5 GB per second you expect to stream.

gabbferreiraOption: A
Apr 23, 2025

It’s A

AungshumanOption: B
May 1, 2025

As per GCP document multi-region meets the troughput requirement.

aditya_aliOption: A
May 5, 2025

You need a write latency of 1.5 GBs per second. Given the high throughput requirement, a regional BigQuery table (Option A) is generally preferred over a multi-regional table due to potentially lower write latency in multi-region. Simple.