Professional Cloud Architect Exam QuestionsBrowse all questions from this exam

Professional Cloud Architect Exam - Question 240


TerramEarth plans to connect all 20 million vehicles in the field to the cloud. This increases the volume to 20 million 600 byte records a second for 40 TB an hour.

How should you design the data ingestion?

Show Answer
Correct Answer: BC

Vehicles write data directly to Google Cloud Pub/Sub. Google Cloud Pub/Sub is designed for real-time data streaming and can handle high volumes of data efficiently. It provides a reliable and scalable solution for ingesting large volumes of data with the ability to handle bursts in traffic, ensuring data integrity and minimizing loss. By decoupling the ingestion and processing phases, Pub/Sub also allows for better management of the data flow, which is crucial for handling the continuous stream of records from millions of vehicles.

Discussion

17 comments
Sign in to comment
jcmoranpOption: B
Oct 26, 2019

It's Pub/Sub, too much data streaming for Bigquery...

alexspam88
Jun 17, 2021

Too much for pubsub either https://cloud.google.com/pubsub/quotas

Bill831231
Oct 6, 2021

thanks for sharing the link, but seems pub/sub can handle more streaming data than bigquery. pub/sub 120,000,000 kB per minute (2 GB/s) in large regions, bigquery is 1GB/s

JoeShmoeOption: B
Nov 15, 2019

Its B, it exceeds the streaming limit for BQ

omermahgoubOption: B
Dec 28, 2022

To handle the volume of data that TerramEarth plans to ingest, it is recommended to use a scalable and reliable data ingestion solution such as Google Cloud Pub/Sub. With Cloud Pub/Sub, the vehicles can stream data directly to the service, which can handle the high volume of data and provide a buffer to absorb sudden spikes in traffic. The data can then be processed and stored in a data warehouse such as BigQuery for analysis. Option A (writing data directly to GCS) may not be suitable for handling high volumes of data in real-time and may result in data loss if the volume exceeds the capacity of GCS. Option C (streaming data directly to BigQuery) may not be suitable for handling high volumes of data in real-time as it may result in data loss or ingestion delays. Option D (continuing to write data using the existing system) may not be suitable as the current system may not be able to handle the increased volume of data and may result in data loss or ingestion delays.

sank8
Dec 29, 2022

correct. thanks for the explanation

MahAliOption: A
Dec 13, 2023

They are sending files through FTP why everyone is missing this point? The max message size in pub sub is 10MB as I remember, I would keep the files solution and try to roll out updates to direct the upload to GCS

nunopires2001Option: B
Jan 27, 2023

I know it's B, however the sensors are probably legacy systems, that can not communicate to a pub/sub queue. Ignoring how huge is to change or adapta 20 million devices is a mistake.

amxexamOption: B
May 15, 2022

We need to buffer, the default limit of BigQuery is 100 API calls per second, till now this cannot be changed. Hence we should ease using Pub/Sub so B.

cdcollectorOption: A
Jun 18, 2022

Should be A - see next question on 80% cellular connectivity and Avro format files streamed directly to GCS

AzureDP900Option: B
Jul 4, 2022

B is right!

the1dvOption: B
Apr 10, 2024

Wow its almost like GCP shouldnt have offloaded their IoT Core product - you cant "Write direct to PubSub". Its the correct answer but its overly simplified Writing directly to GCS will cost a fortune to retrieve in GET requests etc

[Removed]Option: B
Apr 22, 2022

You can request limit increases to use BQ streaming for this load, but why pay to store data before ETL?

Mahmoud_EOption: B
Oct 20, 2022

B is the correct answer, this similar question was in google simple questions

meguminOption: B
Nov 5, 2022

ok for B

surajkrishnamurthyOption: B
Dec 17, 2022

B is the correct answer

kaparaOption: B
May 28, 2023

it's B

BiddlyBdoyngOption: D
Jun 19, 2023

So many people pointing out this breaks the BigQuery quota limit but very few pointing out it also breaks the Pub/Sub quote limit.......... So the answer is either not bound by the quota limit (in which case why not BigQuery) both are wrong and we stick with FTP

Vesta1807Option: C
Dec 29, 2023

Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table. Instead of using a job to load data into BigQuery, you can choose to stream your data into BigQuery one record at a time by using the tabledata().insertAll() method. This approach enables querying data without the delay of running a load job. References: https://cloud.google.com/bigquery/streaming-data-into-bigquery

VegasDegenerateOption: B
Jul 4, 2024

Has to be pub-sub, you have remote vehicles and need to guarantee message delivery.