Examice

Exam AZ-305 All QuestionsBrowse all questions from this exam

Go to Exam

Question 90

HOTSPOT

You are designing a data analytics solution that will use Azure Synapse and Azure Data Lake Storage Gen2.

You need to recommend Azure Synapse pools to meet the following requirements:

• Ingest data from Data Lake Storage into hash-distributed tables.

• Implement query, and update data in Delta Lake.

What should you recommend for each requirement? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Correct Answer:

Discussion

saiyandjinn

The second question is confusing, and I am not sure what the answer is - Can query delta lake with Serverless SQL pool but won't be able to update it. - Only Apache Spark pools support updates to Delta Lakes files. It can also be used to query long-time series as well if I understand the doc correctly... I think the answer to 2 is Apache Spark tools on that basis...

RandomNickname

Agree. From what I can find SQL pool can't update delta lake files only Apache Spark can do that, assuming article is accurate below; https://www.jamesserra.com/archive/2022/03/azure-synapse-and-delta-lake/#:~:text=Serverless%20SQL%20pools%20do%20not%20support%20updating%20delta,in%20Azure%20Synapse%20Analytics%20to%20update%20Delta%20Lake.

Fidel_104

The question mentions 'Data Lake Storage', not Delta Lake - there is no explicit indication that the data is stored in a delta lake format. Therefore I don't think that the Spark pool is needed. Nevertheless, Delta Lake is indeed a very confusing name for what is essentially a data format ("optimized storage layer").

Fidel_104

Ah I take it back, Delta lake is also mentioned later, sry for the confusion.

WeepingMaplte

Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files Taken from: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format

Liveroso

The answer is correct. Azure Synapse Analytics (also named SQL Data Warehouse) is a cloud-based analytics service that allows you to analyze large amounts of data using a combination of on-demand and provisioned resources. It offers several different options for working with data, including: - Dedicated SQL pool: It's best for big and complex tasks. - Serverless Apache Spark pool: It's best for big data analysis and machine learning tasks using Spark SQL and Spark DataFrames. - Serverless SQL pool: This is a service that automatically adjusts the amount of resources you use based on your needs. You only pay for what you use. It's best for small to medium-sized tasks and tasks that change often.

sawanti

How can you spent so much time to give explained answers, but you still get them wrong? First answer is correct, second one is Apache Spark pool. Serverless SQL pool doesn't provides updates: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format. Do you see any information about updates there? Updates are possible in Apache Spark: https://docs.delta.io/latest/delta-update.html Btw - what "Apache Spark is best for big data analysis and ML tasks" have in common with Delta Lake updates? Are you copying the answers from the ChatGPT? I have worked with Databricks for 2 years and Apache Spark is the right answer. Apache Spark can be also used for small scenarios as it's not that expensive and is often used by data engineers, not just big data engineers

sawanti

Last note - Hash-distributed tables are used for VERY LARGE FACT TABLES. As per documentation (https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute): Consider using a hash-distributed table when: The table size on disk is more than 2 GB.

NotMeAnyWay

1. Ingest data from Data Lake Storage into hash-distributed tables: A. A dedicated SQL pool A dedicated SQL pool in Azure Synapse provides the ability to create hash-distributed tables, which help distribute data evenly across multiple nodes and improve query performance. This option is well-suited for ingesting data from Data Lake Storage into hash-distributed tables. 2. Implement query, and update data in Delta Lake: B. A serverless Apache Spark pool A serverless Apache Spark pool in Azure Synapse allows you to run Apache Spark jobs on-demand without having to manage the underlying infrastructure. This option is ideal for working with Delta Lake, as it provides native support for querying and updating data stored in Delta Lake format.

calotta1

From MSFT docs: Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake.

Exams_Prep_2021

Got this on Sept. 29, 2023

Forex19

I had question at 24th Sep 2023

Tr619899

To meet the requirements of ingesting data from Data Lake Storage into hash-distributed tables and implementing query and update operations in Delta Lake, the recommended Azure Synapse pool options are as follows: Ingest Data from Data Lake Storage into Hash-Distributable Tables: A dedicated SQL pool: This option allows you to leverage the power of the dedicated SQL pool (formerly SQL Data Warehouse) in Azure Synapse to perform high-performance ingest operations into hash-distributed tables. The dedicated SQL pool is optimized for large-scale data warehousing scenarios. Implement Query and Update Data in Delta Lake: A serverless Apache Spark pool: This option allows you to use Apache Spark as a serverless processing engine within Azure Synapse. Spark provides robust support for querying and updating data in Delta Lake, which is an open-source storage layer for reliable big data processing.

xRiot007

It would be nice to also include a disclaimer saying that this is a response generated by ChatGPT or another similar tool.

salman_23_c4

Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?tabs=x80070002#delta-lake

Helice

Second looks to be Apache spark pools as Serverless pool cannot update delta. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?tabs=x80070002#delta-lake

betterthanlife

Says it plain as do (what a shock!) "Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake."

Paul_white

OPTION 2: SERVERLESS APACHE SPARK POOL

RanOlfati

Dedicated SQL Pools Purpose: Dedicated SQL pools provide massive parallel processing (MPP) capabilities ideal for handling large volumes of data. They are optimized for complex queries over large datasets and are suitable for building enterprise-level, big data analytics solutions. Spark Pools Purpose: Spark pools in Azure Synapse provide a fully managed Apache Spark environment. They are designed to handle big data processing, analytics, and machine learning tasks. Spark pools can process data in various formats and from multiple sources, including Azure Data Lake Storage.

Bigbluee

From the Delat Lake: "Delta Lake is fully compatible with Apache Spark APIs, and was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale." So Delta Lake points to Apache Spark. In this case 2nd is Apache Spark Pool

Gaz_

From Copilot: To meet the requirements for ingesting data from Data Lake Storage into hash-distributed tables, you should recommend A Dedicated SQL pool. This option is designed for large-scale, high-performance, and secure analytics on Azure. For implementing, querying, and updating data in Delta Lake, you should recommend A serverless Apache Spark pool. This option allows you to run big data analytics and artificial intelligence workloads with Apache Spark, which is compatible with Delta Lake. These recommendations align with Azure's best practices for performance and scalability when working with Synapse and Data Lake Storage Gen2. If you need further details or assistance with the setup, feel free to ask.

23169fd

Ingest data from Data Lake Storage into hash-distributed tables: A dedicated SQL pool Implement, query, and update data in Delta Lake: A serverless Apache Spark poo

23169fd

Requirement 1: Ingest data from Data Lake Storage into hash-distributed tables A dedicated SQL pool: This pool is specifically designed for high-performance data warehousing. It allows for the ingestion of large datasets into hash-distributed tables, optimizing performance and scalability. Hash distribution is a key feature of dedicated SQL pools to enhance query performance for large datasets. Recommendation: A dedicated SQL pool Requirement 2: Implement, query, and update data in Delta Lake A serverless Apache Spark pool: Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark. It is optimized for big data workloads and is best utilized with Apache Spark pools. The serverless Apache Spark pool in Azure Synapse provides a managed Spark environment, ideal for working with Delta Lake for querying, updating, and managing large datasets.

Lazylinux

Box 1: is correct => A hash-distributed table distributes table rows across the Compute nodes by using a deterministic hash function to assign each row to one distribution. Since identical values always hash to the same distribution, SQL Analytics has built-in knowledge of the row locations. In dedicated SQL pool this knowledge is used to minimize data movement during queries, which improves query performance. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute Box 2: should be Apache Spark pools in Azure Synapse enable data engineers to modify Delta Lake files https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-delta-lake-format

Chenn

Ingest data from Data Lake Storage into hash-distributed tables: For this requirement, I recommend using a Dedicated SQL pool in Azure Synapse. This service is designed for large-scale data processing and supports creating hash-distributed tables to optimize query performance. Implement, query, and update data in Delta Lake: For this requirement, I recommend using a Serverless Apache Spark pool in Azure Synapse. This service provides capabilities for working with Delta Lake as it offers an analytics service that can handle big data processing tasks without the need to provision or manage clusters.

ahmedkmj

from ChatGPT : For implementing, querying, and updating data in Delta Lake, the most suitable option among the ones you listed would be A serverless Apache Spark pool. Here's why: Integration with Delta Lake: Apache Spark is tightly integrated with Delta Lake, offering native support for reading from and writing to Delta tables. This integration ensures seamless compatibility and efficient data processing capabilities