Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 112


You operate a logistics company, and you want to improve event delivery reliability for vehicle-based sensors. You operate small data centers around the world to capture these events, but leased lines that provide connectivity from your event collection infrastructure to your event processing infrastructure are unreliable, with unpredictable latency. You want to address this issue in the most cost-effective way. What should you do?

Show Answer
Correct Answer: B

To improve event delivery reliability for vehicle-based sensors and address the issue of unreliable leased lines with unpredictable latency in a cost-effective way, having the data acquisition devices publish data to Cloud Pub/Sub is the most suitable option. Cloud Pub/Sub is a fully managed messaging service that can reliably ingest and process event data while providing automatic retries and fault-tolerance. This allows for improved reliability of event delivery despite the unreliable connections. Additionally, Cloud Pub/Sub scales seamlessly, reducing costs associated with over-provisioning infrastructure.

Discussion

17 comments
Sign in to comment
[Removed]Option: B
Mar 22, 2020

Should be B

GanshankOption: C
Apr 13, 2020

C. This is a tricky one. The issue here is the unreliable connection between data collection and data processing infrastructure, and to resolve it in a cost-effective manner. However, it also mentions that the company is using leased lines. I think replacing the leased lines with Cloud InterConnect would solve the problem, and hopefully not be an added expense. https://cloud.google.com/interconnect/docs/concepts/overview

serg3d
Jun 3, 2020

Yea, this would definitely solve the issue, but it's not "the most cost-effective way". I think PubSub is the correct answer.

sh2020
Jun 19, 2020

I agree, C is the only choice that addresses the problem. The problem is caused by leased line. How come pub/sub service can resolve it? Pub/sub will still use the leased line

snamburi3
Nov 19, 2020

the question also talks about a cost effective way...

awssp12345
Jul 6, 2021

DEFINITELY NOT COST EFFECT. C IS THE WORST CHOICE.

ayush_1995Option: B
Jan 29, 2023

B. Have the data acquisition devices publish data to Cloud Pub/Sub. This would provide a reliable messaging service for your event data, allowing you to ingest and process your data in a timely manner, regardless of the reliability of the leased lines. Cloud Pub/Sub also offers automatic retries and fault-tolerance, which would further improve the reliability of your event delivery. Additionally, using Cloud Pub/Sub would allow you to easily scale up or down your event processing infrastructure as needed, which would help to minimize costs.

rr4444Option: D
Aug 15, 2022

Feels like everyone is wrong. A. Deploy small Kafka clusters in your data centers to buffer events. - Silly in a GCP cloudnative context, plus they have messaging infra anyway B. Have the data acquisition devices publish data to Cloud Pub/Sub. - They have messaging infra, so why? Unless they want to replace, it, but that doesn't change the issue C. Establish a Cloud Interconnect between all remote data centers and Google. - Wrong, because Interconnect is basically a leased line. There must be some telecoms issue with it, which we can assume is unresolvable e.g. long distance remote locations and sometimes water ingress, and the telco can't justify sorting it yet, or is slow to, or something. Leased lines usually don't come with awful internet connectivity, so sound physical connectivity issue. Sure, an Interconnect is better, more direct, but a leased line should be bullet proof. D. Write a Cloud Dataflow pipeline that aggregates all data in session windows. - The only way to address dodgy/delayed data delivery

zellckOption: B
Dec 5, 2022

B is the answer.

AzureDP900
Dec 30, 2022

yes it is B. Have the data acquisition devices publish data to Cloud Pub/Sub.

NicolasNOption: A
Dec 15, 2022

As usual the answer is hidden somewhere in the Google Cloud Blog: "In the case of our automotive company, the data is already stored and processed in local data centers in different regions. This happens by streaming all sensor data from the cars via MQTT to local Kafka Clusters that leverage Confluent’s MQTT Proxy." "This integration from devices to a local Kafka cluster typically is its own standalone project, because you need to handle IoT-specific challenges like constrained devices and unreliable networks." 🔗 https://cloud.google.com/blog/products/ai-machine-learning/enabling-connected-transformation-with-apache-kafka-and-tensorflow-on-google-cloud-platform

desertlotus1211
Jan 19, 2023

The question is asking from the on-premise infrastructure, which already has the data, to the event processing infrastructure, which is in the GCP, is unreliable.... it not asking from the sensors to the on-premise...

desertlotus1211
Jan 19, 2023

I might have to retract my answer... Are they talking about GCP in this question? where is the event processing infrastructure?

desertlotus1211Option: A
Jan 19, 2023

Are they talking about GCP in this question? Where is the event processing infrastructure? Answer A, might be correct!

musumusuOption: A
Feb 16, 2023

Best answer is A, By using Kafka, you can buffer the events in the data centers until a reliable connection is established with the event processing infrastructure. But go with B, its google asking :P

musumusu
Feb 24, 2023

I read this question again, I wanna answer C. Buying Data acquisition devices and set them up with sensor, i dont think its practical approach. Imagine, Adruino is cheapest IOT available in market for 15 dollars, but who will open the sensor box and install it .. omg,, its a big job. This question depends if IOT devices that are attached to sensor needs to be programmed. Big Headache right. Use google cloud connect to deal with current situation. Or reprogramme IOT if they have connected with sensors.

t11Option: B
Aug 21, 2022

It has to be B.

TNT87Option: B
Sep 14, 2022

Cloud Pub/Sub, it supports batch & streaming , push and pull capabilities Answer B

piotrpiskorskiOption: B
Nov 21, 2022

yeah, changing whole architecture arround the world for the use of pub/sub is so much more cost efficient than Cloud Interconnect (which is like 3k$).. It's C.

odacir
Dec 7, 2022

It's not a Cloud Interconnect, it's a lot of interconnect ones per data center, PUB/SUB addresses all the requirements. Its B

odacir
Dec 7, 2022

ALSO, the problem it's no t your connection, its the connectivity BT your event collection infrastructure to your event processing infrastructure, so PUSUB it's perfect for this

jkhong
Dec 8, 2022

Wouldn't using cloud interconnect also result in amendments to each of the data center around the world? I don't see why there would be a huge architecture change when using PubSub, the publishers would just need to push messages directly to pubsub, instead of pushing to their own cost center. Also, if the script for pushing messages can be standardised, the data centers can share it around to

PrashantGupta1616Option: B
Dec 27, 2022

pub/sub is region is a global service It's important to note that the term "global" in this context refers to the geographical scope of the service

ZZHZZHOption: C
Jul 8, 2023

The question is misleading. But should be C since it addresses the unpredictablility and latency directly.

NeoNitinOption: B
Aug 5, 2023

its says with unpredictable latency and here no need to worry about connection So B is the right one

FP77Option: C
Aug 13, 2023

I don't know why B is the most voted. The issue here is unreliable connectivity and C is the perfect use-case for that

NandababyOption: B
Dec 16, 2023

Even with Cloud Pub/Sub, unpredictable latency or delays could still occur due to the unreliable leased lines connecting your event collection infrastructure and event processing infrastructure. While Cloud Pub/Sub offers reliable message delivery within its own network, the handoff to your processing infrastructure is still dependent on the leased lines. Replacing leased lines with Cloud Interconnect could potentially resolve the overall issue of unpredictable latency in event processing pipeline but it could be unnecessary expense provided data centers distributed world wide. Cloud Pub/Sub along with other optimization techniques like Cloud VPN or edge computing might be sufficient.

Anudeep58Option: B
Jun 17, 2024

Option B: Have the data acquisition devices publish data to Cloud Pub/Sub. Rationale: Managed Service: Cloud Pub/Sub is a fully managed service, reducing the operational overhead compared to managing Kafka clusters. Reliability and Scalability: Cloud Pub/Sub can handle high volumes of data with low latency and provides built-in mechanisms for reliable message delivery, even in the face of intermittent connectivity. Cost-Effective: Cloud Pub/Sub offers a pay-as-you-go pricing model, which can be more cost-effective than setting up and maintaining dedicated network infrastructure like Cloud Interconnect. Global Availability: Cloud Pub/Sub is available globally and can handle data from multiple regions efficiently.