How should clustering be used to optimize the performance of queries that run on a very large table?
How should clustering be used to optimize the performance of queries that run on a very large table?
To optimize the performance of queries on a very large table, you should use the column that is most-frequently used in the query's select clauses as the clustering key. This approach enhances the efficiency of data retrieval by ensuring that the data is organized in a manner that aligns with the query patterns, leading to faster filtering and more efficient use of indexes.
it s D
selective filters is different than select clause
It's C : https://docs.snowflake.com/en/user-guide/tables-clustering-keys#strategies-for-selecting-clustering-keys "Selecting the right columns/expressions for a clustering key can dramatically impact query performance. Analysis of your workload will usually yield good clustering key candidates. Snowflake recommends prioritizing keys in the order below: Cluster columns that are most actively used in selective filters"
C Snowflake recommends prioritizing keys in the order below: Cluster columns that are most actively used in selective filters. For many fact tables involved in date-based queries (for example “WHERE invoice_date > x AND invoice date <= y”), choosing the date column is a good idea. For event tables, event type might be a good choice, if there are a large number of different event types. (If your table has only a small number of different event types, then see the comments on cardinality below before choosing an event column as a clustering key.) If there is room for additional cluster keys, then consider columns frequently used in join predicates, for example “FROM table1 JOIN table2 ON table2.column_A = table1.column_B”.
C https://docs.snowflake.com/en/user-guide/tables-clustering-keys#:~:text=Cluster%20columns%20that%20are%20most%20actively%20used%20in%20selective%20filters
I think it is D, per the doc you listed: The number of distinct values (i.e. cardinality) in a column/expression is a critical aspect of selecting it as a clustering key. It is important to choose a clustering key that has: A large enough number of distinct values to enable effective pruning on the table. A small enough number of distinct values to allow Snowflake to effectively group rows in the same micro-partitions.
"Snowflake recommends prioritizing keys in the order below: Cluster columns that are most actively used in selective filters."