Certified Data Engineer Professional Exam - Question 24

Question

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Examice · Accepted Answer

The configuration parameter 'spark.sql.files.maxPartitionBytes' directly affects the size of a spark-partition upon ingestion of data into Spark. It defines the maximum number of bytes to pack into a single partition when reading files, which in turn influences the partition size.

8605246 · Answer

correct; The maximum number of bytes to pack into a single partition when reading files. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.
https://spark.apache.org/docs/latest/sql-performance-tuning.html

Jay_98_11 · Answer

correct

sturcu · Answer

from the provided list, this fits best.
In reality partition size/number can be influenced my many settings

Certified Data Engineer Professional Exam - Question 24

Discussion