Exam Certified Data Engineer Associate All QuestionsBrowse all questions from this exam
Question 39

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously. They ask the data engineering team for help. The data engineering team notices that each of the team’s queries uses the same SQL endpoint.

Which of the following approaches can the data engineering team use to improve the latency of the team’s queries?

    Correct Answer: B

    When multiple team members are running small queries simultaneously, the system is experiencing high concurrency, which can lead to resource contention. To address this, the data engineering team can increase the maximum bound of the SQL endpoint’s scaling range. By doing this, they enable the system to scale out, adding more clusters to handle the increased load. This approach helps distribute the simultaneous queries across multiple clusters, improving overall query latency and performance. Increasing the cluster size would be more effective for sequential queries, rather than concurrent ones.

Discussion
damaldonOption: B

Answer is B. According to databricks documentation: -Sequentially -> Increase cluster size -Concurrent --> Scale out cluster

mokraniOption: B

Answer B is correct For those who's selected the same answer as the question 40 in the Databricks exam training, be careful becaue it's quite different: - Here the question is about simultaneously runs -> Scale Out clusters (involves adding more clusters) - In the Databricks exam training, the question is about "sequentially run queries" -> Scale Up (increasing the size of the nodes) Pleas refer to the this accepted answer https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696

Nika12Option: B

Just got 100% on the exam. B was correct. Also, here is the link to good explanation: https://docs.databricks.com/en/compute/cluster-config-best-practices.html

pc1337xdOption: B

Issues occur when too many users are running queries at the same time -> Increase scaling so more clusters handle the queries

AndreFROption: A

question 40 in the official databricks training exam : https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

ezeik

but the question is different: "is affecting all of their sequentially run queries."

AndreFR

I agree, Answer A is incorrect. Correct answer is B, because : The key is simultanously. The autoscaling is triggered by jobs sitting in the queue, so databricks will increase number of workers because there is a queue. If queries were running sequencially, there wouldn’t be queue so increasing the cluster size would be the best choice.

agAshishOption: A

Answer is A , Q40 -- https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

K_yamini

the question on Practice set is slightly different if you look closely :-In the first scenario, the data analyst notes slow query performance for sequentially run queries on a SQL endpoint that isn't shared with other users. This suggests that the problem may be related to the configuration or performance of the SQL endpoint itself rather than contention with other users. In the second scenario, the data analysis team experiences slow query performance when multiple team members are running queries simultaneously on the same SQL endpoint. This indicates potential resource contention or limitations on the SQL endpoint when handling concurrent queries from multiple users. Given these differences, the approaches to address the issues may also differ:

nedloOption: B

its B because its "simultanously by many users" so you have to scale it horizontally by increasing number of nodes : https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696

niharam2021

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously4

Ody__Option: A

correct answer is A Question 40: https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

SerGreyOption: B

B is correct

god_fatherOption: B

Increasing cluster size is for vertical scalability of query execution, while scaling out cluster is for horizontal scalability of query execution

saikotOption: B

The correct answer is B (we can check this under databricks sql WH tool tip option. It is clearly mentioend that scaling is used to improve query "LATANCY")

benni_aleOption: B

simultaneously probably means concurrently so scaling out the cluster is better

sakis213Option: B

B is correct

Ody__Option: A

A is correct

vctrhugoOption: A

A. They can increase the cluster size of the SQL endpoint. To improve the latency of the team's queries when many members are running small queries simultaneously, you can increase the cluster size of the SQL endpoint. Increasing the cluster size allocates more compute resources to handle query execution, which can help reduce query execution times and improve overall performance, especially during periods of high query concurrency. Option B refers to adjusting scaling settings, which can also be beneficial, but increasing the cluster size (Option A) directly allocates more resources, making it a more direct approach to improving query performance. Options C, D, and E relate to different features and configurations (Auto Stop, Serverless, and Spot Instance Policy), but they may not directly address the issue of improving query latency during high concurrency, which is the primary concern in this scenario.

[Removed]Option: A

agree with @AndreFR