DP-200 Exam QuestionsBrowse all questions from this exam

DP-200 Exam - Question 190


You implement an enterprise data warehouse in Azure Synapse Analytics.

You have a large fact table that is 10 terabytes (TB) in size.

Incoming queries use the primary key Sale Key column to retrieve data as displayed in the following table:

Exam DP-200 Question 190

You need to distribute the large fact table across multiple nodes to optimize performance of the table.

Which technology should you use?

Show Answer
Correct Answer: A

A hash distributed table with a clustered ColumnStore index is the optimal solution for this scenario. Hash distribution distributes data evenly across multiple nodes based on a hash function, which enhances the performance of large fact tables. The clustered ColumnStore index is beneficial for analytic and data warehousing workloads, as it provides significant performance improvements and data compression compared to traditional rowstore indexes. While the comment discussed the potential benefits of a clustered index for single-row lookups, the overall performance advantages of a clustered ColumnStore index for a large fact table (10 TB) outweigh the benefits of a clustered index. Therefore, the best choice for distributing and optimizing the performance of the large fact table in Azure Synapse Analytics is a hash distributed table with a clustered ColumnStore index.

Discussion

1 comment
Sign in to comment
Hinzzz
Jun 23, 2021

Clustered indexes may outperform clustered columnstore tables when a single row needs to be quickly retrieved. For queries where a single or very few row lookup is required to perform with extreme speed, consider a clustered index or nonclustered secondary index. The answer could be B as it is sales key-based rows retrieval.

hello_there_
Jun 25, 2021

I agree under the assumption that the table is only used for row retrieval by primary key. It isn't explicitly said that this is the only use though. If this really is the only use of the table, it would make more sense to put it in a CosmosDB with table API. Still, I think that B is the answer they want to hear. Otherwise, why would they have added that queries will use row retrieval by primary key?