Professional Data Engineer Exam - Question 269

Question

Your organization's data assets are stored in BigQuery, Pub/Sub, and a PostgreSQL instance running on Compute Engine. Because there are multiple domains and diverse teams using the data, teams in your organization are unable to discover existing data assets. You need to design a solution to improve data discoverability while keeping development and configuration efforts to a minimum. What should you do?

Examice · Accepted Answer

To improve data discoverability with minimal development and configuration efforts, you should leverage Data Catalog's ability to automatically catalog BigQuery datasets and Pub/Sub topics. For PostgreSQL tables, use the Data Catalog APIs to manually catalog them. This approach ensures you are using native support where available and manual cataloging for assets that are not automatically supported, optimizing the process while keeping efforts low.

raaad · Answer

- It utilizes Data Catalog's native support for both BigQuery datasets and Pub/Sub topics. 
- For PostgreSQL tables running on a Compute Engine instance, you'd use Data Catalog APIs to create custom entries, as Data Catalog does not automatically discover external databases like PostgreSQL.

datapassionate · Answer

Data Catalog is the best choice. But for catalogging PostgreSQL it is better to use a connector when available, instead of using API. 
https://cloud.google.com/data-catalog/docs/integrate-data-sources#integrate_unsupported_data_sources

ML6 · Answer

Google Recommendation: If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries. To do that, you can:
- Use one of the Data Catalog Client Libraries in one of the following languages: C#, Go, Java, Node.js, PHP, Python, or Ruby.
- Or manually build on the Data Catalog API.

However, there is a connector for PostgreSQL, so option C.

Matt_108 · Answer

Option B - Data Catalog automatically maps out GCP resources and dev efforts are minimized by leveraging the data catalog API to do the same for postgresql db

joao_01 · Answer

In the opction C, the expression "Use custom connectors to manually catalog PostgreSQL tables." is refering to the use case of Google when you want to use "Community-contributed connectors to multiple popular on-premises data sources". As you can see, this connectors are for ON-PREMISSES data sources ONLY. In this case the Postgres is in a VM in the cloud. Thus, the option correct is B.

GCP001 · Answer

B.
-- Looks much better option as needed low development efforts. 
-- C not looking right as it will need lot of dev efforts for custom connectors.

saschak94 · Answer

If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries. To do that, you can:

- Manually build on the Data Catalog API.

LaxmanTiwari · Answer

I vote for c as per Integrate on-premises data sources
To integrate on-premises data sources, you can use the corresponding Python connectors contributed by the community:

under the link

https://cloud.google.com/data-catalog/docs/integrate-data-sources

fitri001 · Answer

BigQuery Datasets and Pub/Sub Topics: Google Data Catalog can automatically catalog metadata from BigQuery and Pub/Sub, making it easy to discover and manage these data assets without additional development effort.

PostgreSQL Tables: While Data Catalog does not have built-in connectors for PostgreSQL, you can use the Data Catalog APIs to manually catalog the PostgreSQL tables. This requires some custom development but is manageable compared to creating custom connectors for everything.

Harshzh12 · Answer

Datacatalog API contain the connector for postgresql with using it developer don't have to create the custom connectors

Y___ash · Answer

Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.

hanoverquay · Answer

option B, there's no need to build a custom connector now, postgreSQL is now supported
https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/tree/master/google-datacatalog-postgresql-connector

Cassim · Answer

Option B leverages Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics, which streamlines the process and reduces manual effort. Using Data Catalog APIs to manually catalog PostgreSQL tables ensures consistency across all data assets while minimizing development and configuration efforts.

virat_kohli · Answer

B. Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.

Professional Data Engineer Exam - Question 269

Discussion