DP-201 Exam QuestionsBrowse all questions from this exam

DP-201 Exam - Question 84


You have a large amount of sensor data stored in an Azure Data Lake Storage Gen2 account. The files are in the Parquet file format.

New sensor data will be published to Azure Event Hubs.

You need to recommend a solution to add the new sensor data to the existing sensor data in real-time. The solution must support the interactive querying of the entire dataset.

Which type of server should you include in the recommendation?

Show Answer
Correct Answer: D

Azure Databricks is a suitable solution for integrating new sensor data with existing data in real-time and supports interactive querying of the entire dataset. Azure Databricks facilitates seamless data processing and analytics in a collaborative environment, integrating well with Azure Data Lake Storage Gen2 where the sensor data is stored in Parquet file format. This ensures effective handling of large datasets and complex queries, meeting the requirements for real-time data integration and interactive queries.

Discussion

7 comments
Sign in to comment
sdas1
Mar 26, 2021

As per below link the answer is correct. https://azure.microsoft.com/en-in/blog/new-capabilities-in-stream-analytics-reduce-development-time-for-big-data-apps/

cadio30
May 25, 2021

Both Azure Databricks and Azure Stream analytics can output data to parquet format and have interactive queries as well. For simplicity, I'll choose Azure Stream Analytics

BobFar
May 27, 2021

ASA doesn't support Parquet format.!

BobFar
May 27, 2021

I was wrong, it supports now https://azure.microsoft.com/en-us/updates/stream-analytics-offers-native-support-for-parquet-format/#:~:text=Azure%20Stream%20Analytics%20now%20offers,in%20the%20Big%20Data%20ecosystems.

BobFar
May 27, 2021

I was wrong, it supports now https://azure.microsoft.com/en-us/updates/stream-analytics-offers-native-support-for-parquet-format/#:~:text=Azure%20Stream%20Analytics%20now%20offers,in%20the%20Big%20Data%20ecosystems.

mbravo
May 29, 2021

One of the requirements is to be able to interactively query the whole (possibly very large) dataset according to the scenario. This requirement alone is a perfect fit for Spark. I highly doubt there is a sensible way to achieve this with ASA. Therefore I vote for Databricks.

cadio30
Jun 10, 2021

Opt to choose Azure Databricks instead of Azure Streaming Analytics due to the keywork 'large dataset' Reference: https://techcommunity.microsoft.com/t5/analytics-on-azure/azure-stream-analytics-real-time-analytics-for-big-data-made/ba-p/549621

VG2007
May 3, 2021

Native support for egress in Apache parquet format into Azure Blob Storage is now generally available. Parquet is a columnar format enabling efficient big data processing. By outputting data in parquet format into a blob store or a data lake, you can take advantage of Azure Stream Analytics to power large scale streaming extract, transfer, and load (ETL), to run batch processing, to train machine learning algorithms, or to run interactive queries on your historical data. We are now announcing general availability of this feature for egress to Azure Blob Storage.

YOMYOM
Mar 17, 2021

is C really the correct answer pls?

H_S
Mar 22, 2021

i think it's D because the interactive querying of the entire dataset. entire dataset/interative isn't possible with A.stream

jms309
Mar 27, 2021

I think that Databrick is a good answer. I'm not sure if Azure Stream Analytics is another right answer but maybe there are two possibilities

anamaster
Apr 18, 2021

interactive querying eliminates ASA

niwe
Apr 26, 2021

Azure Stream Analytics does not support Parquet data format. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-stream-analytics-query-patterns

saifone
May 5, 2021

It does as of July 2019 https://azure.microsoft.com/en-us/updates/stream-analytics-offers-native-support-for-parquet-format/

saifone
May 5, 2021

It does as of July 2019 https://azure.microsoft.com/en-us/updates/stream-analytics-offers-native-support-for-parquet-format/

sdas1
Mar 26, 2021

As per below link the answer is correct. new-capabilities-in-stream-analytics-reduce-development-time-for-big-data-app

daradev
Jul 20, 2021

By outputting data in parquet format into a blob store or a data lake, you can take advantage of Azure Stream Analytics to power large scale streaming extract, transfer, and load (ETL), to run batch processing, to train machine learning algorithms, or to run interactive queries on your historical data. Soure: https://azure.microsoft.com/en-in/blog/new-capabilities-in-stream-analytics-reduce-development-time-for-big-data-apps/

hello_there_
Aug 10, 2021

What this quote says is that ASA can output parquet format to blob storage, so that another tool can then run interactive queries on the data. ASA itself can't do interactive queries on parquet in blob storage, which is what is required here. I'd go with databricks.