AZ-305 Exam - Question 249

Question

You have an Azure subscription that contains an Azure Cosmos DB for NoSQL account named account1 and an Azure Synapse Analytics workspace named Workspace1. The account1 account contains a container named Contained that has the analytical store enabled.

You need to recommend a solution that will process the data stored in Contained in near-real-time (NRT) and output the results to a data warehouse in Workspace1 by using a runtime engine in the workspace. The solution must minimize data movement.

Which pool in Workspace1 should you use?

Examice · Accepted Answer

To process data stored in the Cosmos DB for NoSQL account's container in near-real-time (NRT) and output the results to a data warehouse in Workspace1, you should use the Apache Spark pool in Azure Synapse Analytics. Apache Spark is a distributed processing framework that can handle streaming data efficiently, which aligns with the need for near-real-time processing. Moreover, it can directly access data stored in Azure Cosmos DB's analytical store, minimizing data movement. This ensures that data can be processed and outputted to the data warehouse quickly and with reduced latency, making it the most suitable choice for the given requirements.

KeyMan · Answer

B. Serverless SQL pool

Reasoning:
Serverless SQL pool in Azure Synapse Analytics is designed to handle on-demand queries against large datasets, which is suitable for the NRT processing requirement stated.

Minimal Data Movement: Using serverless SQL pool allows querying data in place without the need to move data into the pool, which aligns with the need to minimize data movement. It can directly query the Cosmos DB analytical store.

Integration with Cosmos DB Analytical Store: Serverless SQL pool has built-in integration with Azure Cosmos DB's analytical store, allowing efficient and performant processing of the data.

Apache Spark could also process the data, but it would involve more data movement compared to serverless SQL. Dedicated SQL pool requires pre-provisioned resources and wouldn't be as cost-effective for NRT scenarios. Data Explorer is not a compute pool within Azure Synapse Analytics.

masetromain · Answer

Apache Spark is a distributed processing framework that can handle near-real-time processing and is well-integrated with Azure Synapse Analytics. It can directly access data stored in Azure Cosmos DB analytical store without needing to move the data around. This minimizes data movement and provides efficient processing capabilities.

So, the correct answer is:

A. Apache Spark

MohsenSic · Answer

I go with A:
Two reasons: 
Synapse had Apache Spark, 
Dat explore is mainly for logs, refer to the bottom flowchart of the below link

https://learn.microsoft.com/en-us/azure/data-explorer/data-explorer-overview

TarasSheva · Answer

A. Apache Spark

Near-Real-Time (NRT) Processing: Apache Spark provides capabilities for real-time stream processing, which aligns with the requirement for near-real-time processing.

Integration with Azure Cosmos DB: Apache Spark has built-in connectors and libraries for integrating with Azure Cosmos DB, allowing for seamless data ingestion and processing without significant data movement.

Output to Data Warehouse: Apache Spark can easily output processed data to various destinations, including data warehouses like Azure Synapse Analytics. It can write directly into dedicated SQL pools or serverless SQL pools within the Synapse workspace.

Minimizing Data Movement: Since Apache Spark can directly access data in Azure Cosmos DB and write results to the data warehouse within the same Azure environment, it minimizes data movement, thus optimizing performance and reducing costs.

Appon · Answer

because of "near-real-time"

azureworm · Answer

A is the correct answer https://learn.microsoft.com/en-us/azure/cosmos-db/synapse-link-use-cases

LGWJ12 · Answer

A: Apache Spark,it's in Azure Synapse Analytics is an analytics engine that facilitates large-scale data processing. It can read data from Cosmos DB in near-real-time, process it, and then output the results to a data warehouse in the same Azure Synapse Analytics workspace. This minimizes data movement as the data processing and storage are happening within the same service (Azure Synapse Analytics).

23169fd · Answer

Apache Spark: Spark pools in Azure Synapse Analytics provide a distributed data processing framework capable of processing large volumes of data in near-real-time. Spark is highly efficient in handling streaming data and can directly read from Azure Cosmos DB's analytical store with minimal data movement, making it an ideal choice for near-real-time processing.

Serverless SQL and Dedicated SQL: While these can be used for querying and processing data, they are not as optimized for near-real-time processing as Apache Spark. Additionally, they typically involve more data movement compared to Spark's direct processing capabilities.

Data Explorer: This is typically used for fast ad-hoc data exploration and querying, particularly for log and telemetry data, rather than for continuous near-real-time data processing and transformation.

Frank_2022 · Answer

I recommend using a dedicated SQL pool

Near-real-time processing: Dedicated SQL pools are specifically designed for low-latency analytical workloads, making them ideal for processing data in near-real-time.

Data minimization: Dedicated SQL pools are integrated with Workspace1, allowing for seamless data movement between the Cosmos DB analytical store and the data warehouse within the same workspace. This minimizes data movement and avoids the need for external data transfer processes.

Runtime engine: Dedicated SQL pools provide a T-SQL compatible query engine that can be used to interact with data stored in the data warehouse. This allows you to leverage familiar SQL syntax for data transformation and analysis.

rumino · Answer

Azure Data Explorer is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real time. The Azure Data Explorer toolbox gives you an end-to-end solution for data ingestion, query, visualization, and management.
https://learn.microsoft.com/en-us/azure/data-explorer/data-explorer-overview
https://learn.microsoft.com/en-us/azure/synapse-analytics/data-explorer/data-explorer-overview

Frank_2022 · Answer

Dedicated SQL pools are specifically designed for low-latency analytical workloads, making them ideal for processing data in near-real-time.

varinder82 · Answer

Final Answer :
D

varinder82 · Answer

Final Answer : D

moadabdou · Answer

For processing data stored in the 'Contained' container of Cosmos DB in near-real-time (NRT) and outputting results to a data warehouse in Workspace1, leveraging an Apache Spark pool within Azure Synapse Analytics is highly recommended. This approach is particularly effective due to Apache Spark's robust in-memory processing capabilities, which can handle large volumes of data swiftly. Additionally, by utilizing Azure Synapse Link for seamless integration with Cosmos DB's analytical store, this solution ensures minimal data movement. This not only enhances performance by enabling direct real-time data access but also optimizes resource utilization and reduces latency, making it an ideal setup for real-time data analytics.

AZ-305 Exam - Question 249

Discussion