DP-201 Exam QuestionsBrowse all questions from this exam

DP-201 Exam - Question 59


You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure Synapse Analytics.

You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained.

What should you recommend?

Show Answer
Correct Answer: C

Parquet is the best choice for this scenario as it is designed for efficient querying and robust data type preservation. It is a columnar storage format that supports complex nested data structures, making it highly efficient for analytical queries. Both Azure Databricks and PolyBase in Azure Synapse Analytics support Parquet, ensuring compatibility and minimizing errors during data consumption. CSV and JSON formats do not retain data type information as effectively, and Avro is not supported by PolyBase, making Parquet the superior option for this use case.

Discussion

12 comments
Sign in to comment
felmasri
Mar 12, 2021

I think this Answer is wrong since polybase does not support Avro. I will pick Parquet

jms309
Mar 27, 2021

I understand that Databricks and Polybase will consume the data independently ... So, based on that premise the selected output format from Synapse Stream Analytics should be a format compatible with both. Since, we need the file format to be a distributed file format for speed up the queries, the only possible solutions are AVRO and Parquet. As, AVRO is no a valid solution as Polybase doesn't support this format, the only possible answer is PARQUET

maciejt
Apr 6, 2021

JSON and CSV don't define the types strongly and we need to preserve the data types, so those 2 are exuded. Parquet is better optimized for read, avro is for write and requirement is to make queries fast, so parquet. https://www.datanami.com/2018/05/16/big-data-file-formats-demystified/

davita8
Apr 29, 2021

C. Parquet

dumpi
Jun 8, 2021

Parquet is correct answer I verify

Nik71
Mar 24, 2021

its Parquet file format

cadio30
May 24, 2021

Both services uses CSV and parquet as input files though parquet is the candidate for this requirement as it is the recommended file format for azure databricks and is also supported by polybase

KpKo
May 28, 2021

Agreed with Parquet

kz_data
Mar 13, 2021

I think Parquet is the right Answer

H_S
Mar 15, 2021

avro is not supported by polybase, but why not CSV

H_S
Mar 15, 2021

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs it's PARKET

al9887655
Mar 24, 2021

Polybase support requirement eliminates Avro. Not sure what the right answer is.

massnonn
Nov 18, 2021

for me the correct answer is parquet