DP-200 Exam - Question 19

Question

HOTSPOT -You are developing a solution using a Lambda architecture on Microsoft Azure. The data at rest layer must meet the following requirements:Data storage:✑ Serve as a repository for high volumes of large files in various formats. ✑ Implement optimized storage for big data analytics workloads. ✑ Ensure that data can be organized using a hierarchical structure. Batch processing:✑ Use a managed solution for in-memory computation processing. ✑ Natively support Scala, Python, and R programming languages. ✑ Provide the ability to resize and terminate the cluster automatically. Analytical data store:✑ Support parallel processing. ✑ Use columnar storage. ✑ Support SQL-based languages. You need to identify the correct technologies to build the Lambda architecture. Which technologies should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:.

Examice · Accepted Answer

Data storage: Azure Data Lake StoreA key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized. With the hierarchical namespace enabled, a storage account becomes capable of providing the scalability and cost-effectiveness of object storage, with file system semantics that are familiar to analytics engines and frameworks. Batch processing: HD Insight SparkAparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications. HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP,MapReduce. Languages: R, Python, Java, Scala, SQLAnalytic data store: Azure Synapse AnalyticsAzure Synapse Analytics Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP). Azure Synapse Analytics stores data into relational tables with columnar storage. Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics. References:https://docs. microsoft. com/en-us/azure/storage/blobs/data-lake-storage-namespace https://docs. microsoft. com/en-us/azure/architecture/data-guide/technology-choices/batch-processing https://docs. microsoft. com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is.

gallego82 · Answer

I think that in batch processing the answer sould be Azure DataBricks, due to the link provided an it´s capabilities:
Azure Databricks is an Apache Spark-based analytics platform. You can think of it as "Spark as a service." It's the easiest way to use Spark on the Azure platform.

Languages: R, Python, Java, Scala, Spark SQL
    Fast cluster start times, autotermination, autoscaling.
    Manages the Spark cluster for you.
    Built-in integration with Azure Blob Storage, Azure Data Lake Storage (ADLS), Azure Synapse, and other services. See Data Sources.
    User authentication with Azure Active Directory.
    Web-based notebooks for collaboration and data exploration.
    Supports GPU-enabled cluster

Pairon · Answer

Agree with the comments above. Databricks enables you to autoscale and autoterminate your cluster and enables also to in-memory processing because of the undelying Spark engine.

LG5 · Answer

Batch processing should be Azure Databricks right?

Manoel_Benicio · Answer

that´s correct so the answers would be: Azure DataBricks, Azure DataLake and ASA (Azure Synapse)

AZ20 · Answer

"terminate the cluster automatically" - I think this line makes Databricks a more suitable choice
Rest requirements suits both HDInsight and Databricks equally.

Wendy_DK · Answer

Batch processing should be Azure Databricks

ssanka · Answer

I think answer should be cosmos db for 3rd one. Azure synapse doesn't support columnar storage right ?

NamishBansal · Answer

For third one Synapse will work but why will Cosmos not work?

maciejt · Answer

Why not cosmos for analytical datastore?

MYR55 · Answer

ADLS
HDInsight Spark ( https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-autoscale-clusters ) keyword here is in-memory processing
Azure Synapse Analytics

Palp · Answer

Batch processing is Spark as it provides in memory operations

DP-200 Exam - Question 19

Discussion