DP-201 Exam QuestionsBrowse all questions from this exam

DP-201 Exam - Question 22


Your company is an online retailer that can have more than 100 million orders during a 24-hour period, 95 percent of which are placed between 16:30 and 17:00.

All the orders are in US dollars. The current product line contains the following three item categories:

✑ Games with 15,123 items

✑ Books with 35,312 items

✑ Pens with 6,234 items

You are designing an Azure Cosmos DB data solution for a collection named Orders Collection. The following documents is a typical order in Orders Collection.

Exam DP-201 Question 22

Orders Collection is expected to have a balanced read/write-intensive workload.

Which partition key provides the most efficient throughput?

Show Answer
Correct Answer: D

The partition key should have a high cardinality, meaning it should have a wide range of possible values to evenly distribute the workload across multiple partitions. Given the distribution of 100 million orders, using 'Item/Category' would create only three logical partitions, leading to significant hotspots and inefficiencies. 'OrderTime' would also cause a hotspot issue due to the 95% of orders placed within a narrow time frame. 'Item/Currency' is not suitable since all transactions are in USD, resulting in a single partition. 'Item/id' provides a unique identifier for each order, ensuring that each partition is balanced in terms of read and write operations, making it the most efficient choice for throughput.

Discussion

24 comments
Sign in to comment
kempstonjoystick
Apr 1, 2020

Given there are 100 million orders in a 24 hour period, and there are only three catgegories, is Item/Id not a better solution, otherwise the category will cause significant hotspots?

Taddi10
Aug 4, 2020

I think if the id was an integrer (inremental foe exemple ) it can be a good partition key but with this format i think category is the best choice

MamadouNiang
May 4, 2020

2 paragraphs below the link given in microsoft docs, there is an interesting answer : Using item ID as the partition key If your container has a property that has a wide range of possible values, it is likely a great partition key choice. One possible example of such a property is the item ID. For small read-heavy containers or write-heavy containers of any size, the item ID is naturally a great choice for the partition key. The item ID is a great partition key choice for the following reasons: There are a wide range of possible values (one unique item ID per item). Because there is a unique item ID per item, the item ID does a great job at evenly balancing RU consumption and data storage. You can easily do efficient point reads since you'll always know an item's partition key if you know its item ID.

Treadmill
Aug 9, 2020

D correct: Source as above quoted https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview

ceasarrr
Apr 19, 2020

Item/Id is the correct answer. Item/Category is not balanced (15k,35k,6k)

TaherAli2020
Feb 19, 2021

If you use the Item/Category property as a partition key, then it has a small cardinality. Even if the documents are evenly distributed across the collection, for large collections, any category might outgrow a single partition. If the categories aren't evenly distributed across the documents in the collection, then the problem is even worse. The dominant category restricts the ability of Azure Cosmos DB to scale. Item/Category is not a good choice for the partition key. https://docs.microsoft.com/en-us/learn/modules/monitor-and-scale-cosmos-db/5-partition-lesson

sturcu
Feb 26, 2021

Nice link. It is exactly the case from the ex.

TkSQL
May 5, 2021

this link is the answer to all the confusion here

cadio30
Jun 2, 2021

perfect! the link provided clear states the strategy of optimizing partition.

tejasjoshi
Jul 6, 2021

Superb ! Its crystal clear now. Partition should be on Item/id. Requesting all to go through above link.

M0e
Oct 23, 2020

Given the discussion here: https://docs.microsoft.com/en-us/learn/modules/monitor-and-scale-cosmos-db/5-partition-lesson, "Item/id" is the correct answer

MLCL
Apr 14, 2020

The answer is correct, imagine if in 24h, every kind of item has been ordered at least once, you will end up with 100k logical partitons, is that a good choice for partitioning ? i think partitionning by category is more efficient since we will have a 1h intensive read/write period and dispatching items in 3 partitions is less time consuming that dispatching on thousands.

Tombarc
Apr 26, 2020

You're probably confusing partition key unique values with logical partitions, high cardinality isn't a bad thing, actually, it's good. The distribution of the partitions is handled behind the curtains, so if you have 1000 distinct values of 10mb each, they may end up into a single partition, considering the max size of a partition is about 10gb. So the correct answer is item/id, otherwise the partitions wouldn't be balanced as requested.

Niteen
May 14, 2020

You're saying correctly - High cardinality isn't a bad thing. It's not like Tablespace/ Table level partitioning. To choose the correct key for not only write but read purpose also. So, here most of the data divided into category wise. Hence, "Category" is the correct answer.

Piiri565
Nov 13, 2020

Choose a partition key that has a wide range of values and access patterns that are evenly spread across logical partitions, According to this line , its clearly saying pattern , and there is no pattern in the id, so no logical partitioning with the id,, So the given answer is correct

Niteen
May 14, 2020

You're saying correctly - High cardinality isn't a bad thing. It's not like Tablespace/ Table level partitioning. To choose the correct key for not only write but read purpose also. So, here most of the data divided into category wise. Hence, "Category" is the correct answer.

Piiri565
Nov 13, 2020

Choose a partition key that has a wide range of values and access patterns that are evenly spread across logical partitions, According to this line , its clearly saying pattern , and there is no pattern in the id, so no logical partitioning with the id,, So the given answer is correct

monumentalcrankiness
Oct 22, 2020

I shall go with D. Item/Id Item/Category is out. It will only create 3 logical partitions, that also unevenly distributed. A logical distribution has a size cap of 20 GB. With 100 million orders per day, it won't be very hard to reach that limit quickly. OrderTime is out. 16:30 to 17:00 spike shall create a hotspot problem. Item/Currency is out. Only 1 value "USD" will result in everything cramming up one logical partition. Only Item/id is left. So this is the answer.

LeonLeon
Jun 29, 2020

In this case A is correct indeed. See the reference and be aware of the read/write balancing. The read is as important as the throuput. Partition keys for read-heavy containers For most containers, the above criteria is all you need to consider when picking a partition key. For large read-heavy containers, however, you might want to choose a partition key that appears frequently as a filter in your queries. Queries can be efficiently routed to only the relevant physical partitions by including the partition key in the filter predicate. If most of your workload's requests are queries and most of your queries have an equality filter on the same property, this property can be a good partition key choice. For example, if you frequently run a query that filters on UserID, then selecting UserID as the partition key would reduce the number of cross-partition queries

Sudipta3009
Jul 14, 2020

Ur explanation is correct

davita8
Apr 28, 2021

D. Item/id is the answer

zb99
Apr 15, 2020

I would say that "category" is correct only if we infer that it is likely to be commonly used in queries. But the question specifically says to optimize for throughput, so I'd think "item/id" would be more correct.

Yuri1101
Apr 18, 2020

Agree, only Item/Id can distribute data evenly since the difference in the number of items is huge among those three categories.

Leonido
Apr 25, 2020

Disagree. Id here stands for "Order ID", not "Item Id", so you'll end up with 100M partitions, which will lead to high read latency - because of inter partition data movement.

Israel2
Jul 6, 2020

It doesn't stand for 'Order ID'. It says: Item/id

Israel2
Jul 6, 2020

It doesn't stand for 'Order ID'. It says: Item/id

Leonido
Apr 25, 2020

Disagree. Id here stands for "Order ID", not "Item Id", so you'll end up with 100M partitions, which will lead to high read latency - because of inter partition data movement.

Israel2
Jul 6, 2020

It doesn't stand for 'Order ID'. It says: Item/id

Israel2
Jul 6, 2020

It doesn't stand for 'Order ID'. It says: Item/id

henry_x
May 23, 2020

No doubt it is item/ID. refer to https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#choose-partitionkey. it is clearly addressed Using item ID as the partition key. partition key here is not partition used in DW.

brcdbrcd
Nov 23, 2020

item/id for sure. see the section "Propose partition key values for the collection" at: https://docs.microsoft.com/en-us/learn/modules/monitor-and-scale-cosmos-db/5-partition-lesson

IAMKPR
May 17, 2021

Answer should be "item/id". You can find almost similar example in below link. https://docs.microsoft.com/en-us/learn/modules/monitor-and-scale-cosmos-db/5-partition- lesson

MMM777
Jun 5, 2021

This example is definitely VERY similar to the question and explains why several of the proposed values are not good choices, and also shows "Item/Id" to be a decent choice.

Ard
Apr 2, 2020

i agree.

syu31svc
Dec 8, 2020

From https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#choose-partitionkey: "Have a high cardinality. In other words, the property should have a wide range of possible values." D is the answer

samok
Apr 6, 2020

Agree with kempstonjoystick

BHAWS
Jun 12, 2020

Choose a partition key that has a wide range of values,so the data is evenly spread across logical partitioning. Hence I suggest the answer is item/category

Ash666
Aug 12, 2020

https://docs.microsoft.com/en-us/azure/cosmos-db/partition-data#logical-partitions https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview https://www.examtopics.com/exams/microsoft/dp-201/view/6/ D Item/ID Category doesn’t distribute RU evenly across partitions. Low cardinality.

azurearch
May 9, 2020

Here Item/id - id refers to the unique id of each document in cosmos not the item id.

azurearch
May 16, 2020

agree with ceasarrr

krisspark
Jul 24, 2020

these comments causing further confusing for new bees as it's not able to draw whats final correct answer.. I would go by Item/Category only... this combo may not give repeated values as item would be different in same category.. item/id might create super heavy number of partitions

Yaswant
Aug 8, 2020

Consider we have provisioned a throughput of 1200 request units and we know that throughput can be provisioned in cosmos db only at a container level or at a database level. In our case we consider our online retailer to be Walkart. Now walkart has an account in cosmosdb and they have a document db with coresql api. Now walkart has created a container named orders in their cosmos account and provisioned 1200ru's. Now consider the case of choosing a partition key. Considering they have 1200 customer id-s and if they use id as partition key they will have their throughput spread across partitions which makes their unused throughput in vain as customers come buy and go and it makes a hotspot. Now if we choose product category as partition we'll be having a balanced throughput and read-write.

Shivam131
Sep 26, 2020

your partition key should: Be a property that has a value which does not change. If a property is your partition key, you can't update that property's value. Have a high cardinality. In other words, the property should have a wide range of possible values. Spread request unit (RU) consumption and data storage evenly across all logical partitions. This ensures even RU consumption and storage distribution across your physical partitions.

lingjun
Nov 12, 2020

Candidates for partition keys might include properties that appear frequently as a filter in your queries. Queries can be efficiently routed by including the partition key in the filter predicate. Item ID will not appear as a filter most likely

lingjun
Nov 12, 2020

For small read-heavy containers or write-heavy containers of any size, Item-ID is naturally good choice. In this case, we have balanced read/write workload

Deepu1987
Feb 22, 2021

Given solution is right where we choose the item/category. It's explained in detail in the below link https://medium.com/walmartglobaltech/deep-dive-azure-cosmos-partitions-and-partitionkey-14e898f371cd this concept is of major focus as question may not be exactly asked in exam we need to need to know the concept of physical & logical partitions pre-requisites & Partition key as well.

BobFar
May 20, 2021

the item/id is the correct solution, regarding to the explanation in the link that you posted, all the documents related to the item/id will store in same partition.