Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?
Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?
The configuration parameter 'spark.sql.files.maxPartitionBytes' directly affects the size of a spark-partition upon ingestion of data into Spark. It defines the maximum number of bytes to pack into a single partition when reading files, which in turn influences the partition size.
correct
correct; The maximum number of bytes to pack into a single partition when reading files. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC. https://spark.apache.org/docs/latest/sql-performance-tuning.html
from the provided list, this fits best. In reality partition size/number can be influenced my many settings