Certified Data Engineer Associate Exam - Question 39

Question

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously. They ask the data engineering team for help. The data engineering team notices that each of the team’s queries uses the same SQL endpoint.

Which of the following approaches can the data engineering team use to improve the latency of the team’s queries?

Examice · Accepted Answer

When multiple team members are running small queries simultaneously, the system is experiencing high concurrency, which can lead to resource contention. To address this, the data engineering team can increase the maximum bound of the SQL endpoint’s scaling range. By doing this, they enable the system to scale out, adding more clusters to handle the increased load. This approach helps distribute the simultaneous queries across multiple clusters, improving overall query latency and performance. Increasing the cluster size would be more effective for sequential queries, rather than concurrent ones.

damaldon · Answer

Answer is B.
According to databricks documentation:
-Sequentially -> Increase cluster size
-Concurrent --> Scale out cluster

mokrani · Answer

Answer B is correct
 For those who's selected the same answer as the question 40 in the Databricks exam training, be careful becaue it's quite different:
- Here the question is about simultaneously runs ->  Scale Out clusters (involves adding more clusters)
- In the Databricks exam training, the question is about  "sequentially run queries" -> Scale Up (increasing the size of the nodes)

Pleas refer to the this  accepted answer
https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696

AndreFR · Answer

question 40 in the official databricks training exam : https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

pc1337xd · Answer

Issues occur when too many users are running queries at the same time -> Increase scaling so more clusters handle the queries

Nika12 · Answer

Just got 100% on the exam. B was correct. Also, here is the link to good explanation:
https://docs.databricks.com/en/compute/cluster-config-best-practices.html

nedlo · Answer

its B because its "simultanously by many users" so you have to scale it horizontally by increasing number of nodes : https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696

agAshish · Answer

Answer is A , Q40 -- https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

saikot · Answer

The correct answer is B
(we can check this under databricks sql WH tool tip option. It is clearly mentioend that scaling is used to improve query "LATANCY")

god_father · Answer

Increasing cluster size is for vertical scalability of query execution, while scaling out cluster is for horizontal scalability of query execution

SerGrey · Answer

B is correct

Ody__ · Answer

correct answer is A
Question 40: https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

niharam2021 · Answer

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously4

[Removed] · Answer

agree with @AndreFR

vctrhugo · Answer

A. They can increase the cluster size of the SQL endpoint.

To improve the latency of the team's queries when many members are running small queries simultaneously, you can increase the cluster size of the SQL endpoint. Increasing the cluster size allocates more compute resources to handle query execution, which can help reduce query execution times and improve overall performance, especially during periods of high query concurrency.

Option B refers to adjusting scaling settings, which can also be beneficial, but increasing the cluster size (Option A) directly allocates more resources, making it a more direct approach to improving query performance.

Options C, D, and E relate to different features and configurations (Auto Stop, Serverless, and Spot Instance Policy), but they may not directly address the issue of improving query latency during high concurrency, which is the primary concern in this scenario.

Ody__ · Answer

A is correct

sakis213 · Answer

B is correct

benni_ale · Answer

simultaneously probably means concurrently so scaling out the cluster is better

Certified Data Engineer Associate Exam - Question 39

Discussion