Exam Professional Data Engineer All QuestionsBrowse all questions from this exam
Question 292

You have terabytes of customer behavioral data streaming from Google Analytics into BigQuery daily. Your customers’ information, such as their preferences, is hosted on a Cloud SQL for MySQL database. Your CRM database is hosted on a Cloud SQL for PostgreSQL instance. The marketing team wants to use your customers’ information from the two databases and the customer behavioral data to create marketing campaigns for yearly active customers. You need to ensure that the marketing team can run the campaigns over 100 times a day on typical days and up to 300 during sales. At the same time, you want to keep the load on the Cloud SQL databases to a minimum. What should you do?

    Correct Answer: C

    To ensure the marketing team can efficiently run their campaigns while minimizing the load on the Cloud SQL databases, using Datastream to replicate the required tables from both Cloud SQL databases to BigQuery is the best approach. BigQuery is designed for high-frequency, large-scale analytics and can handle the high volume of queries demanded. By replicating the data into BigQuery, the load on the operational databases in Cloud SQL is minimized, avoiding performance degradation of those databases.

Discussion
raaadOption: C

- Datastream: It's a fully managed, serverless service for real-time data replication. It allows to stream data from various sources, including Cloud SQL, into BigQuery. - Reduced Load on Cloud SQL: By replicating the required tables from both Cloud SQL databases into BigQuery, you minimize the load on the Cloud SQL instances. The marketing team's queries will be run against BigQuery, which is designed to handle high-volume analytics workloads. - Frequency of Queries: BigQuery can easily handle the high frequency of queries (100 times daily, up to 300 during sales events) due to its powerful data processing capabilities. - Combining Data Sources: Once the data is in BigQuery, you can efficiently combine it with the Google Analytics data for comprehensive analysis and campaign planning.

SanjeevRoy91

Why not A ? Federrated queries will downgrade Cloud SQL perf?

scaenruyOption: C

C. Create streams in Datastream to replicate the required tables from both Cloud SQL databases to BigQuery for these queries.

Smakyel79

Datastream is a serverless, easy-to-use change data capture (CDC) and replication service. By replicating the necessary tables from the Cloud SQL databases to BigQuery, you can offload the query load from the Cloud SQL databases. The marketing team can then run their queries directly on BigQuery, which is designed for large-scale data analytics. This approach seems to balance both efficiency and performance, minimizing load on the Cloud SQL instances.

987af6bOption: C

Initially I said A, but this question was how I learned about Datastream, which I think would be the better solution in this scenario. So my answer is C

AlizCertOption: C

C, noting that federated queries on read replicas would be the ideal solution

joao_01Option: C

Its option C. "Performance. A federated query is likely to not be as fast as querying only BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from the external data source to BigQuery. Also, the source database might not be optimized for complex analytical queries." So, it will load the Cloud SQL external sources with the queries, impacting performance on those. Link: https://cloud.google.com/bigquery/docs/federated-queries-intro

datasmgOption: C

C is make sense

JyoGCPOption: C

Option C