Certified Data Engineer Associate Exam - Question 82

Question

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Examice · Accepted Answer

When SQL queries are submitted to a non-running SQL endpoint, the primary delay comes from the time it takes for the endpoint to start up. Enabling the Serverless feature for the SQL endpoint significantly reduces this start-up time from minutes to seconds, thereby ensuring quicker query execution. This approach directly addresses the issue of slow query performance due to initial start-up delays.

carpa_jo · Answer

The important point of this scenario is "when they are submitted to a non-running SQL endpoint". So its not about increasing the instance size or the amount of instances to improve the query performance, but its about reducing the start-up time.
A: Not possible, serverless can't be combined with spot instance policies, see https://docs.databricks.com/en/compute/sql-warehouse/serverless.html#limitations
B: Auto Stop is about terminating a SQL warehouse after x minutes of being idle.
C: Increasing the cluster size provides more capacities for running queries, but doesn't reduce start-up time.
D: Serverless reduces start-up time from minutes to seconds. Jackpot!
E: Increasing the max bound of the SQL endpoints scaling range will help with lots of sequencial queries, which is not the case here.

AndreFR · Answer

key word, “non-running SQL endpoint” which implies that the query is slow because the cluster needs time to get started.

I suggest answer D because :

A : Serverless & spot instances cannot be mixed ?

B : autostop means that jobs are submitted to non-running SQL endpoints

C : increasing the clustersize can compensate for slow startup time

D : serverless is able to start and scale faster than non-running SQL endpoints (seconds intead of minutes)

E : increasing maximum bound will help only if there are simultaneous queries

https://docs.gcp.databricks.com/en/lakehouse-architecture/cost-optimization/best-practices.html#use-serverless-for-your-workloads

Syd · Answer

Answer E:

https://www.databricks.com/blog/2022/03/10/top-5-databricks-performance-tips.html

msengupta · Answer

https://community.databricks.com/t5/data-engineering/sql-query-takes-too-long-to-run/td-p/21884

nedlo · Answer

D is wrong - its already Serverless (non running SQL endpoint) how would turning Serverless ON help? They also says C here https://community.databricks.com/t5/data-engineering/when-to-increase-maximum-bound-vs-when-to-increase-cluster-size/td-p/27880 . E is only true for autoscaling clusters

olaru · Answer

maximum bound of the SQL endpoint's scaling range

meow_akk · Answer

Ans E : you re welcome :) 
https://community.databricks.com/t5/data-engineering/when-to-increase-maximum-bound-vs-when-to-increase-cluster-size/td-p/27880

Garyn · Answer

C. They can increase the cluster size of the SQL endpoint.

Explanation:

Increasing the cluster size of the SQL endpoint can enhance query performance by providing more computational resources to execute queries. This can potentially speed up query processing by allowing more parallelism, handling larger workloads, and reducing the time taken for query execution.

bartfto · Answer

"when they are submitted to a non-running SQL endpoint" ANSWER D

azure_bimonster · Answer

D is correct. Key phrase is "submitted to a non-running SQL endpoint". Increasing cluster size is not going to help if that's in a state like non-running.

Certified Data Engineer Associate Exam - Question 82

Discussion