Certified Data Engineer Associate Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Associate Exam - Question 39


A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously. They ask the data engineering team for help. The data engineering team notices that each of the team’s queries uses the same SQL endpoint.

Which of the following approaches can the data engineering team use to improve the latency of the team’s queries?

Show Answer
Correct Answer: B

When multiple team members are running small queries simultaneously, the system is experiencing high concurrency, which can lead to resource contention. To address this, the data engineering team can increase the maximum bound of the SQL endpoint’s scaling range. By doing this, they enable the system to scale out, adding more clusters to handle the increased load. This approach helps distribute the simultaneous queries across multiple clusters, improving overall query latency and performance. Increasing the cluster size would be more effective for sequential queries, rather than concurrent ones.

Discussion

17 comments
Sign in to comment
damaldonOption: B
Sep 6, 2023

Answer is B. According to databricks documentation: -Sequentially -> Increase cluster size -Concurrent --> Scale out cluster

mokraniOption: B
Nov 7, 2023

Answer B is correct For those who's selected the same answer as the question 40 in the Databricks exam training, be careful becaue it's quite different: - Here the question is about simultaneously runs -> Scale Out clusters (involves adding more clusters) - In the Databricks exam training, the question is about "sequentially run queries" -> Scale Up (increasing the size of the nodes) Pleas refer to the this accepted answer https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696

AndreFROption: A
Aug 20, 2023

question 40 in the official databricks training exam : https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

ezeik
Sep 21, 2023

but the question is different: "is affecting all of their sequentially run queries."

AndreFR
Dec 20, 2023

I agree, Answer A is incorrect. Correct answer is B, because : The key is simultanously. The autoscaling is triggered by jobs sitting in the queue, so databricks will increase number of workers because there is a queue. If queries were running sequencially, there wouldn’t be queue so increasing the cluster size would be the best choice.

pc1337xdOption: B
Nov 13, 2023

Issues occur when too many users are running queries at the same time -> Increase scaling so more clusters handle the queries

Nika12Option: B
Jan 27, 2024

Just got 100% on the exam. B was correct. Also, here is the link to good explanation: https://docs.databricks.com/en/compute/cluster-config-best-practices.html

nedloOption: B
Dec 12, 2023

its B because its "simultanously by many users" so you have to scale it horizontally by increasing number of nodes : https://community.databricks.com/t5/data-engineering/sequential-vs-concurrency-optimization-questions-from-query/td-p/36696

agAshishOption: A
Feb 1, 2024

Answer is A , Q40 -- https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

K_yamini
Feb 7, 2024

the question on Practice set is slightly different if you look closely :-In the first scenario, the data analyst notes slow query performance for sequentially run queries on a SQL endpoint that isn't shared with other users. This suggests that the problem may be related to the configuration or performance of the SQL endpoint itself rather than contention with other users. In the second scenario, the data analysis team experiences slow query performance when multiple team members are running queries simultaneously on the same SQL endpoint. This indicates potential resource contention or limitations on the SQL endpoint when handling concurrent queries from multiple users. Given these differences, the approaches to address the issues may also differ:

saikotOption: B
Sep 16, 2023

The correct answer is B (we can check this under databricks sql WH tool tip option. It is clearly mentioend that scaling is used to improve query "LATANCY")

god_fatherOption: B
Oct 30, 2023

Increasing cluster size is for vertical scalability of query execution, while scaling out cluster is for horizontal scalability of query execution

SerGreyOption: B
Jan 8, 2024

B is correct

Ody__Option: A
Jan 14, 2024

correct answer is A Question 40: https://files.training.databricks.com/assessments/practice-exams/PracticeExam-DataEngineerAssociate.pdf

niharam2021
Feb 9, 2024

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously4

[Removed]Option: A
Aug 29, 2023

agree with @AndreFR

vctrhugoOption: A
Sep 4, 2023

A. They can increase the cluster size of the SQL endpoint. To improve the latency of the team's queries when many members are running small queries simultaneously, you can increase the cluster size of the SQL endpoint. Increasing the cluster size allocates more compute resources to handle query execution, which can help reduce query execution times and improve overall performance, especially during periods of high query concurrency. Option B refers to adjusting scaling settings, which can also be beneficial, but increasing the cluster size (Option A) directly allocates more resources, making it a more direct approach to improving query performance. Options C, D, and E relate to different features and configurations (Auto Stop, Serverless, and Spot Instance Policy), but they may not directly address the issue of improving query latency during high concurrency, which is the primary concern in this scenario.

Ody__Option: A
Jan 14, 2024

A is correct

sakis213Option: B
Apr 1, 2024

B is correct

benni_aleOption: B
Apr 29, 2024

simultaneously probably means concurrently so scaling out the cluster is better