Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 269


Your organization's data assets are stored in BigQuery, Pub/Sub, and a PostgreSQL instance running on Compute Engine. Because there are multiple domains and diverse teams using the data, teams in your organization are unable to discover existing data assets. You need to design a solution to improve data discoverability while keeping development and configuration efforts to a minimum. What should you do?

Show Answer
Correct Answer: BC

To improve data discoverability with minimal development and configuration efforts, you should leverage Data Catalog's ability to automatically catalog BigQuery datasets and Pub/Sub topics. For PostgreSQL tables, use the Data Catalog APIs to manually catalog them. This approach ensures you are using native support where available and manual cataloging for assets that are not automatically supported, optimizing the process while keeping efforts low.

Discussion

14 comments
Sign in to comment
raaadOption: B
Jan 5, 2024

- It utilizes Data Catalog's native support for both BigQuery datasets and Pub/Sub topics. - For PostgreSQL tables running on a Compute Engine instance, you'd use Data Catalog APIs to create custom entries, as Data Catalog does not automatically discover external databases like PostgreSQL.

AllenChen123
Jan 21, 2024

Agree. https://cloud.google.com/data-catalog/docs/concepts/overview#catalog-non-google-cloud-assets

datapassionateOption: C
Jan 29, 2024

Data Catalog is the best choice. But for catalogging PostgreSQL it is better to use a connector when available, instead of using API. https://cloud.google.com/data-catalog/docs/integrate-data-sources#integrate_unsupported_data_sources

tibuenoc
Feb 1, 2024

Agree. If it doesn't have a connector, it must be manually built on the Data Catalog API. As PostgreSQL already has a connector it's the best option is C

ML6Option: C
Feb 17, 2024

Google Recommendation: If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries. To do that, you can: - Use one of the Data Catalog Client Libraries in one of the following languages: C#, Go, Java, Node.js, PHP, Python, or Ruby. - Or manually build on the Data Catalog API. However, there is a connector for PostgreSQL, so option C.

Matt_108Option: B
Jan 13, 2024

Option B - Data Catalog automatically maps out GCP resources and dev efforts are minimized by leveraging the data catalog API to do the same for postgresql db

joao_01Option: B
Apr 7, 2024

In the opction C, the expression "Use custom connectors to manually catalog PostgreSQL tables." is refering to the use case of Google when you want to use "Community-contributed connectors to multiple popular on-premises data sources". As you can see, this connectors are for ON-PREMISSES data sources ONLY. In this case the Postgres is in a VM in the cloud. Thus, the option correct is B.

joao_01
Apr 7, 2024

Link: https://cloud.google.com/data-catalog/docs/concepts/overview#catalog-non-google-cloud-assets

GCP001Option: B
Jan 8, 2024

B. -- Looks much better option as needed low development efforts. -- C not looking right as it will need lot of dev efforts for custom connectors.

saschak94Option: C
Feb 9, 2024

If you can't find a connector for your data source, you can still manually integrate it by creating entry groups and custom entries. To do that, you can: - Manually build on the Data Catalog API.

LaxmanTiwariOption: C
Apr 25, 2024

I vote for c as per Integrate on-premises data sources To integrate on-premises data sources, you can use the corresponding Python connectors contributed by the community: under the link https://cloud.google.com/data-catalog/docs/integrate-data-sources

LaxmanTiwari
Apr 25, 2024

data catalog api will come into effect if custom connectors are not available via community repos.

fitri001Option: B
Jun 17, 2024

BigQuery Datasets and Pub/Sub Topics: Google Data Catalog can automatically catalog metadata from BigQuery and Pub/Sub, making it easy to discover and manage these data assets without additional development effort. PostgreSQL Tables: While Data Catalog does not have built-in connectors for PostgreSQL, you can use the Data Catalog APIs to manually catalog the PostgreSQL tables. This requires some custom development but is manageable compared to creating custom connectors for everything.

Harshzh12Option: B
Feb 26, 2024

Datacatalog API contain the connector for postgresql with using it developer don't have to create the custom connectors

Y___ashOption: B
Mar 13, 2024

Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.

hanoverquayOption: B
Mar 16, 2024

option B, there's no need to build a custom connector now, postgreSQL is now supported https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/tree/master/google-datacatalog-postgresql-connector

d11379b
Mar 24, 2024

I think “custom connector” here may just infer that this is not official tools? as the doc mentioned “ connectors contributed by the community” And should not be B as “manually catalog by API “ this is a way even more basic than using connector

CassimOption: B
May 14, 2024

Option B leverages Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics, which streamlines the process and reduces manual effort. Using Data Catalog APIs to manually catalog PostgreSQL tables ensures consistency across all data assets while minimizing development and configuration efforts.

virat_kohliOption: B
May 22, 2024

B. Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.