DP-200 Exam QuestionsBrowse all questions from this exam

DP-200 Exam - Question 19


HOTSPOT -

You are developing a solution using a Lambda architecture on Microsoft Azure.

The data at rest layer must meet the following requirements:

Data storage:

✑ Serve as a repository for high volumes of large files in various formats.

✑ Implement optimized storage for big data analytics workloads.

✑ Ensure that data can be organized using a hierarchical structure.

Batch processing:

✑ Use a managed solution for in-memory computation processing.

✑ Natively support Scala, Python, and R programming languages.

✑ Provide the ability to resize and terminate the cluster automatically.

Analytical data store:

✑ Support parallel processing.

✑ Use columnar storage.

✑ Support SQL-based languages.

You need to identify the correct technologies to build the Lambda architecture.

Which technologies should you use? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Exam DP-200 Question 19
Show Answer
Correct Answer:
Exam DP-200 Question 19

Data storage: Azure Data Lake Store

A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized. With the hierarchical namespace enabled, a storage account becomes capable of providing the scalability and cost-effectiveness of object storage, with file system semantics that are familiar to analytics engines and frameworks.

Batch processing: HD Insight Spark

Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications.

HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP,

MapReduce.

Languages: R, Python, Java, Scala, SQL

Analytic data store: Azure Synapse Analytics

Azure Synapse Analytics Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP).

Azure Synapse Analytics stores data into relational tables with columnar storage.

Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.

References:

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is

Discussion

11 comments
Sign in to comment
gallego82
Apr 9, 2021

I think that in batch processing the answer sould be Azure DataBricks, due to the link provided an it´s capabilities: Azure Databricks is an Apache Spark-based analytics platform. You can think of it as "Spark as a service." It's the easiest way to use Spark on the Azure platform. Languages: R, Python, Java, Scala, Spark SQL Fast cluster start times, autotermination, autoscaling. Manages the Spark cluster for you. Built-in integration with Azure Blob Storage, Azure Data Lake Storage (ADLS), Azure Synapse, and other services. See Data Sources. User authentication with Azure Active Directory. Web-based notebooks for collaboration and data exploration. Supports GPU-enabled cluster

Pairon
Apr 12, 2021

Agree with the comments above. Databricks enables you to autoscale and autoterminate your cluster and enables also to in-memory processing because of the undelying Spark engine.

LG5
Apr 8, 2021

Batch processing should be Azure Databricks right?

eliabsbueno
Apr 16, 2021

Yes! HDInsight does not support autotermination natively

Manoel_Benicio
Apr 13, 2021

that´s correct so the answers would be: Azure DataBricks, Azure DataLake and ASA (Azure Synapse)

AZ20
Jun 5, 2021

"terminate the cluster automatically" - I think this line makes Databricks a more suitable choice Rest requirements suits both HDInsight and Databricks equally.

Wendy_DK
Apr 14, 2021

Batch processing should be Azure Databricks

ssanka
Apr 18, 2021

I think answer should be cosmos db for 3rd one. Azure synapse doesn't support columnar storage right ?

meswapnilspal
Apr 26, 2021

it does. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is

NamishBansal
May 1, 2021

For third one Synapse will work but why will Cosmos not work?

maciejt
May 10, 2021

Why not cosmos for analytical datastore?

MYR55
May 29, 2021

ADLS HDInsight Spark ( https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-autoscale-clusters ) keyword here is in-memory processing Azure Synapse Analytics

eurekamike
Jun 28, 2021

Databricks has in-memory processing

Palp
Jun 15, 2021

Batch processing is Spark as it provides in memory operations