Certified Data Engineer Associate Exam - Question 58

Question

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

Examice · Accepted Answer

Parquet files have a well-defined schema. Parquet inherently stores metadata about the schema within the files themselves, including data types and column names. This allows for a structured and consistent schema, which is beneficial when creating an external table. CSV files, on the other hand, lack inherent schema information and may require additional handling or inference of schema during data ingestion.

FastEddie · Answer

CTAS - CTAS automatically infer schema information from query
results and do not support manual schema declaration.This means
that CTAS statements are useful for external data ingestion from
sources with well-defined schema, such as Parquet files and
tables.CTAS statements also do not support specifying additional
file options.

kbaba101 · Answer

C.
it supports well-defined schema, such as Parquet files and tables and do not support specifying additional file options such as Delimeter if you were to use CSV

meow_akk · Answer

Ans : C 
https://www.databricks.com/glossary/what-is-parquet#:~:text=Columnar%20storage%20like%20Apache%20Parquet,compared%20to%20row%2Doriented%20databases.

Columnar storage like Apache Parquet is designed to bring efficiency compared to row-based files like CSV. When querying, columnar storage you can skip over the non-relevant data very quickly. As a result, aggregation queries are less time-consuming compared to row-oriented databases.

anandpsg101 · Answer

c is correct

kishore1980 · Answer

C is the correct option

UGOTCOOKIES · Answer

CREATE TABLE AS SELECT adopts the schema details from the source. Parquet files have a defined schema.

nedlo · Answer

I disagree i think its D. Schema can be inferred from CSV as well, but CSV cannot provide same optimizations as Parquet

AndreFR · Answer

The key word here is : CREATE TABLE AS SELECT

not A : partitioning is not relevant in a create table as statement because the data will be created in a delta table 
not C : Parquet schema is not well defined and there can be parquet files with multiple schema in a folder
not D : Parquet are already optimized and are not relevant in a create table as statement because the data will be created in a delta table 
not E : both CSV & Parquet will become delta tables in a create table as statement
B : correct answer by elimination

Garyn · Answer

C. Parquet files have a well-defined schema.

Explanation:

Parquet files inherently store metadata about the schema within the files themselves, allowing for a well-defined schema. This schema information includes data types, column names, and other structural information. When creating an external table from Parquet, this schema is retained, providing a structured and well-defined format for the data. It ensures consistency and enables more efficient processing, query optimization, and compatibility across various systems or tools that work with the Parquet format.
This structured schema within Parquet files offers advantages in terms of data integrity, ease of data processing, and compatibility, making it a beneficial choice over CSV, which lacks inherent schema information and might need additional handling or inference of schema during data ingestion.

bartfto · Answer

C. Paruqet has well defined schema unline csv

benni_ale · Answer

C is correct

MDWPartners · Answer

The keywords are "CREATE TABLE AS SELECT "

1a44567 · Answer

Vote for D
Parquet files are a columnar storage file format that allows for efficient data compression and encoding schemes, enabling optimization and faster query performance compared to CSV files. This format supports efficient reading and writing of large datasets, making it a preferred choice for big data applications.

Certified Data Engineer Associate Exam - Question 58

Discussion