DP-200 Exam - Question 190

Question

You implement an enterprise data warehouse in Azure Synapse Analytics.

You have a large fact table that is 10 terabytes (TB) in size.

Incoming queries use the primary key Sale Key column to retrieve data as displayed in the following table:

You need to distribute the large fact table across multiple nodes to optimize performance of the table.

Which technology should you use?

Examice · Accepted Answer

A hash distributed table with a clustered ColumnStore index is the optimal solution for this scenario. Hash distribution distributes data evenly across multiple nodes based on a hash function, which enhances the performance of large fact tables. The clustered ColumnStore index is beneficial for analytic and data warehousing workloads, as it provides significant performance improvements and data compression compared to traditional rowstore indexes. While the comment discussed the potential benefits of a clustered index for single-row lookups, the overall performance advantages of a clustered ColumnStore index for a large fact table (10 TB) outweigh the benefits of a clustered index. Therefore, the best choice for distributing and optimizing the performance of the large fact table in Azure Synapse Analytics is a hash distributed table with a clustered ColumnStore index.

Hinzzz · Answer

Clustered indexes may outperform clustered columnstore tables when a single row needs to be quickly retrieved. For queries where a single or very few row lookup is required to perform with extreme speed, consider a clustered index or nonclustered secondary index. The answer could be B as it is sales key-based rows retrieval.

DP-200 Exam - Question 190

Discussion