Exam DP-203 All QuestionsBrowse all questions from this exam
Question 20

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.

FactPurchase will have 1 million rows of data added daily and will contain three years of data.

Transact-SQL queries similar to the following query will be executed daily.

SELECT -

SupplierKey, StockItemKey, IsOrderFinalized, COUNT(*)

FROM FactPurchase -

WHERE DateKey >= 20210101 -

AND DateKey <= 20210131 -

GROUP By SupplierKey, StockItemKey, IsOrderFinalized

Which table distribution will minimize query times?

    Correct Answer: B

    In Azure Synapse Analytics, hash-distributed tables are effective for fact tables with large amounts of data because they evenly distribute the data across different distributions. This process improves query performance by allowing parallel processing. Choosing 'PurchaseKey' as the hash distribution column is optimal because it likely has many unique values, which helps balance the data distribution and prevents skewness. Furthermore, it is not a nullable column or a date column, making it suitable for hash distribution.

Discussion
FredNoOption: B

Correct

Deepshikha1228

B is correct

GameLiftOption: B

Is it hash-distributed on PurchaseKey and not on IsOrderFinalized because 'IsOrderFinalized' yields less distributions(rows either contain yes,no values) compared to PurchaseKey?

Podavenna

Yes, your logic is correct!

saqib839

Plus its better to use hash distribution on column where group by or joins are used

phydevOption: B

Was on my exam today (31.10.2023).

DindaS

Ideally there should be an option to create partition DateKey. When we use the partition key column in the where condition , the unwanted partition's data will be eliminated automatically. that's the beauty of the partition and how it works in conjunction with the query. However, would like to know from the experts in the forum.

SHENOOOOOption: B

B is the Correct Answer

MBRSDGOption: B

almost exactly what's shown in a example of the official docs --> https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#choose-a-distribution-column

74gjd_37Option: B

Hash-distributed tables improve query performance on large fact tables. The PurchaseKey has many unique values, does not have NULLs and is not a date column.

jiajiani

why we cannot use data column?

astone42Option: B

B is correct.

OkkierOption: C

Optimal Distribution Given that the query performs a GROUP BY on SupplierKey, StockItemKey, and IsOrderFinalized, the most balanced approach is to use Round-robin distribution. While it does not ensure that rows with the same key are stored together, it avoids data skew and ensures even distribution, which helps in achieving better performance for aggregate queries.

sdg2844Option: B

Correct. Column with many unique values. Also, it's USUALLY not a column that is used in whereclauses or groupings or such, which this isn't.

pperfOption: B

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#choosing-a-distribution-column

jiajiani

why the answer says it cannot be a date column?

kkk5566Option: B

B is correct

SolutionAOption: B

in this case the sql where condition is on datekey so hash-distributed on PurchaseKey or Round robin distributed table the sql cost will be the same as it will be full table scan

SolutionA

on second thought if purchasekey is not unique what is the constraint and how its created , as the question didn't mention more details , i would go with round robin not the has distributed

mamahaniOption: B

B is correct

henryphchanOption: B

B. Hash the purchasekey to evenly distribute the data into 60 distributions.

DusicaOption: B

what about B plus (imaginary) partitioning on date ? Or is error in question because Purchase Key by itself would not be very helpful