DP-203 Exam QuestionsBrowse all questions from this exam

DP-203 Exam - Question 116


A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure Stream

Analytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU).

You need to optimize performance for the Azure Stream Analytics job.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Show Answer
Correct Answer: CF

To optimize performance for an Azure Stream Analytics job, you should focus on enhancing parallel processing capabilities through partitioning. Implementing query parallelization by partitioning the data input allows the job to process multiple data partitions concurrently, improving throughput and efficient utilization of streaming units. Additionally, implementing query parallelization by partitioning the data output can distribute the processing load evenly across multiple nodes, reducing bottlenecks during data writing and further optimizing performance.

Discussion

17 comments
Sign in to comment
manquakOptions: CF
Sep 1, 2021

Partition input and output. REF: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization

kolakone
Sep 19, 2021

Agree. And partitioning Input and output with same number of partitions gives the best performance optimization..

Lio95Options: BD
Sep 21, 2021

No event consumer was mentioned. Therefore, partitioning output is not relevant. Answer is correct

nicolas1999
Nov 15, 2021

Stream analytics ALWAYS has at least one output. There is no need to mention that. So correct answer is input and output

Boompiee
May 10, 2022

The stream analytics job is the consumer.

ellalaOptions: DF
Oct 8, 2023

I would say DF is correct. Despite C being a correct option to optimize performance, we have no information about the output. If the output is Power BI, it does not support partition. Therefore we cannot state output partition without more information. Therefore best option will be SU

fahfouhi94Options: DF
Sep 30, 2023

the question is about actions should you perform, in case of power bi output , we cannot partition the stream analytics output.SO D & F

MomoanwarOptions: DF
Dec 9, 2023

Chatgpt say DF : The question in the image relates to optimizing the performance of an Azure Stream Analytics job. The correct actions would typically involve scaling the Streaming Units (SUs) appropriately based on the throughput needs and implementing query parallelization. In this context: - Scaling up the SU count (option D) would improve performance if the current SU allocation is insufficient. - Implementing query parallelization by partitioning the data input (option F) could also optimize performance as it would allow the job to process multiple data partitions concurrently.

jongertOptions: CF
Dec 27, 2023

An embarrassingly parallel job allows the highest degree of parallelization. Looking at how the max number of stream units is calculated, it would not be useful to scale them up if you keep a bottleneck at the output. Unsure what a good reference value would be for the number of SUs, but 120 does not seem very low to me.

Khadija10Options: CF
Dec 30, 2023

Partitioning lets you divide data into subsets based on a partition key. If your input (for example Event Hubs) is partitioned by a key, it's highly recommended to specify this partition key when adding input to your Stream Analytics job. Scaling a Stream Analytics job takes advantage of partitions in the input and output. A Stream Analytics job can consume and write different partitions in parallel, which increases throughput. Ref: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization

sdg2844Options: CF
Jan 16, 2024

As there is no indication of any query parallelization currently, we have to choose to parallelize for both input and output as the first/correct answers.

d046bc0Options: CF
Dec 13, 2023

Scale the SU count for the job up - (ChatGPT) This will not necessarily improve the performance of your job, unless your query is CPU-bound or memory-bound. Scaling up the SU count will increase the amount of resources available for your job, but it will also increase the cost. You should first try to optimize your query by using parallelization and repartitioning techniques, and then scale up the SU count only if needed1

prshntdxt7Options: CD
Jan 28, 2024

C. Implement query parallelization by partitioning the data output: "Partitioning lets you divide data into subsets based on a partition key. If your input (for example Event Hubs) is partitioned by a key, it's highly recommended to specify this partition key when adding input to your Stream Analytics job. Scaling a Stream Analytics job takes advantage of partitions in the input and output. A Stream Analytics job can consume and write different partitions in parallel, which increases throughput." D. Scale the SU count for the job up: "The total number of streaming units that can be used by a Stream Analytics job depends on the number of steps in the query defined for the job and the number of partitions for each step... All non-partitioned steps together can scale up to one streaming unit (SU V2s) for a Stream Analytics job. In addition, you can add 1 SU V2 for each partition in a partitioned step."

AlongiOptions: CD
Feb 2, 2024

C and D

ElancheOptions: CD
Mar 6, 2024

D. Scale the SU count for the job up: Increasing the number of Streaming Units (SUs) can improve the performance of the Stream Analytics job by providing more processing power to handle the incoming data stream. C. Implement query parallelization by partitioning the data output: Partitioning the data output can help distribute the processing load across multiple partitions, allowing for parallel execution of queries and enhancing performance.

Bhargava12Options: DF
Apr 2, 2024

Answer is D & F

DusicaOptions: CF
Apr 25, 2024

It says optimize performance, does not say that it is bad so adding SU may be unneccesary cost increase. Parallelization and embarrassingly parallel job is correct

DusicaOptions: CF
May 1, 2024

C and F > same partitions > embarrassingly parallel processing

e56bb91Options: CF
Jul 8, 2024

ChatGPT 4o C. Implement query parallelization by partitioning the data output: Output Partitioning: By partitioning the data output, you can ensure that the processing load is distributed evenly across multiple nodes, which can significantly improve performance by reducing bottlenecks in data writing. F. Implement query parallelization by partitioning the data input: Input Partitioning: Partitioning the data input allows the Stream Analytics job to process different partitions in parallel, leading to better utilization of the available streaming units and improved throughput.

DanweoOptions: DF
Jul 13, 2024

Partitioning on both input and output can help, but we don't know if the output is a service that doesn't support partitioning like Power BI. Scaling up will always assign more resources at least.