Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 37

A small company based in the United States has recently contracted a consulting firm in India to implement several new data engineering pipelines to power artificial intelligence applications. All the company's data is stored in regional cloud storage in the United States.

The workspace administrator at the company is uncertain about where the Databricks workspace used by the contractors should be deployed.

Assuming that all data governance considerations are accounted for, which statement accurately informs this decision?

    Correct Answer: C

    Cross-region reads and writes can incur significant costs and latency. Therefore, compute should be deployed in the same region where the data is stored to optimize performance and reduce costs.

Discussion
spaceexplorerOption: C

C is the answer.

RafaelCFCOption: C

An important part of data governance is usage cost, and, as a general data engineering practice, egress costs related to moving data between regions is always an important consideration. Having the workspaces located in a different region than the contractors will incur to them in very little nuisance, while greatly saving in this sense.

chokthewaOption: C

C is correct.

imatheushenriqueOption: C

(C) The decision is about where the Databricks workspace used by the contractors should be deployed. The contractors are based in India, while all the company's data is stored in regional cloud storage in the United States. When choosing a region for deploying a Databricks workspace, one of the important factors to consider is the proximity to the data sources and sinks. Cross-region reads and writes can incur significant costs and latency due to network bandwidth and data transfer fees. Therefore, whenever possible, compute should be deployed in the same region the data is stored to optimize performance and reduce costs

PatitoOption: B

From where data engineering team developes pipelines is independent of where the data objects reside in the cloud storage.

coercion

These pipelines will create clusters (machines) which will reside in a different region than the data and that will cause latency issues. So C should be the correct option.