Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 24


Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Show Answer
Correct Answer: A

The configuration parameter 'spark.sql.files.maxPartitionBytes' directly affects the size of a spark-partition upon ingestion of data into Spark. It defines the maximum number of bytes to pack into a single partition when reading files, which in turn influences the partition size.

Discussion

3 comments
Sign in to comment
8605246Option: A
Aug 6, 2023

correct; The maximum number of bytes to pack into a single partition when reading files. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC. https://spark.apache.org/docs/latest/sql-performance-tuning.html

Jay_98_11Option: A
Jan 13, 2024

correct

sturcuOption: A
Oct 11, 2023

from the provided list, this fits best. In reality partition size/number can be influenced my many settings