DP-201 Exam - Question 59

Question

You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure Synapse Analytics.

You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained.

What should you recommend?

Examice · Accepted Answer

Parquet is the best choice for this scenario as it is designed for efficient querying and robust data type preservation. It is a columnar storage format that supports complex nested data structures, making it highly efficient for analytical queries. Both Azure Databricks and PolyBase in Azure Synapse Analytics support Parquet, ensuring compatibility and minimizing errors during data consumption. CSV and JSON formats do not retain data type information as effectively, and Avro is not supported by PolyBase, making Parquet the superior option for this use case.

felmasri · Answer

I think this Answer is wrong since polybase does not support Avro.
I will pick Parquet

jms309 · Answer

I understand that Databricks and Polybase will consume the data independently ... So, based on that premise the selected output format from Synapse Stream Analytics should be a format compatible with both. Since, we need the file format to be a distributed file format for speed up the queries, the only possible solutions are AVRO and Parquet. As, AVRO is no a valid solution as Polybase doesn't support this format, the only possible answer is PARQUET

maciejt · Answer

JSON and CSV don't define the types strongly and we need to preserve the data types, so those 2 are exuded.
Parquet is better optimized for read, avro is for write and requirement is to make queries fast, so parquet.
https://www.datanami.com/2018/05/16/big-data-file-formats-demystified/

davita8 · Answer

C. Parquet

dumpi · Answer

Parquet is correct answer I verify

Nik71 · Answer

its Parquet file format

cadio30 · Answer

Both services uses CSV and parquet as input files though parquet is the candidate for this requirement as it is the recommended file format for azure databricks and is also supported by polybase

KpKo · Answer

Agreed with Parquet

kz_data · Answer

I think Parquet is the right Answer

H_S · Answer

avro is not supported by polybase, but why not CSV

al9887655 · Answer

Polybase support requirement eliminates Avro. Not sure what the right answer is.

massnonn · Answer

for me the correct answer is parquet

DP-201 Exam - Question 59

Discussion