DP-420 Exam QuestionsBrowse all questions from this exam

DP-420 Exam - Question 112


You are implementing an Azure Data Factory data flow that will use an Azure Cosmos DB (SQL API) sink to write a dataset. The data flow will use 2,000 Apache

Spark partitions.

You need to ensure that the ingestion from each Spark partition is balanced to optimize throughput.

Which sink setting should you configure?

Show Answer
Correct Answer: C

To ensure balanced ingestion from each Spark partition to optimize throughput when using an Azure Cosmos DB sink, you should configure the 'Batch size' setting. Batch size determines how many objects are written to the Cosmos DB collection in each batch. Proper tuning of this value helps achieve better throughput and ensures that the requests adhere to Cosmos DB's request size limits.

Discussion

4 comments
Sign in to comment
azuredemo2022three
Jun 21, 2024

C. Batch size determines the number of documents that are sent in each batch to the sink. By configuring an appropriate batch size, you can control the number of documents processed at a time and optimize the ingestion process.

imandoOption: C
Jun 3, 2024

C is correct

YellowSky002Option: B
Jan 21, 2025

The "Write Throughput Budget" setting in the Azure Cosmos DB sink of an Azure Data Factory data flow allows you to control the amount of Request Units (RUs) that your data flow operation can consume when writing data to Cosmos DB.

WimTSOption: B
Apr 16, 2025

https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-performance-sinks The Write throughput budget setting allows you to allocate a specific number of Request Units (RUs) for the data flow's write operations to Azure Cosmos DB. By setting this value appropriately, you can distribute the available throughput evenly across all Spark partitions, preventing any single partition from consuming disproportionate resources and ensuring balanced ingestion. This configuration is particularly beneficial when dealing with a high number of partitions, as it helps avoid throttling and maximizes overall throughput.