Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 24

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

    Correct Answer: A

    The configuration parameter 'spark.sql.files.maxPartitionBytes' directly affects the size of a spark-partition upon ingestion of data into Spark. It defines the maximum number of bytes to pack into a single partition when reading files, which in turn influences the partition size.

Discussion
Jay_98_11Option: A

correct

8605246Option: A

correct; The maximum number of bytes to pack into a single partition when reading files. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC. https://spark.apache.org/docs/latest/sql-performance-tuning.html

sturcuOption: A

from the provided list, this fits best. In reality partition size/number can be influenced my many settings