What is the MOST performant file format for loading data in Snowflake?
What is the MOST performant file format for loading data in Snowflake?
The most performant file format for loading data in Snowflake is CSV (Gzipped). This format allows for faster data loading compared to other formats like Parquet and ORC. A CSV (Gzipped) file achieves impressive loading speeds and maintains the simplicity and ease of use associated with the CSV format while benefiting from the reduced size due to compression.
Loading from Gzipped CSV is several times faster than loading from ORC and Parquet at an impressive 15 TB/Hour. While 5-6 TB/hour is decent if your data is originally in ORC or Parquet, don’t go out of your way to CREATE ORC or Parquet files from CSV in the hope that it will load Snowflake faster. Loading data into fully structured (columnarized) schema is ~10-20% faster than landing it into a VARIANT. https://community.snowflake.com/s/article/How-to-Load-Terabytes-Into-Snowflake-Speeds-Feeds-and-Techniques
Verified