Micro partitions and Clustering Flashcards by Ranjith Mohanasundaram

What is Microparitioning?

It is a way how SF physically stores the data. Incoming source data is divided into smaller chunks, their metadata collected and stored.

How well did you know this?

Not at all

Perfectly

Why there is a need for micropartitioning?

For query efficiency. This avoids full table scan by using query pruning. SF collects some metadata on micro partitions. So when a query runs, based on the metadata it prunes away some partitions, so full table scan is avoided.

How well did you know this?

Not at all

Perfectly

What is the size of micro partition?

50 to 500MB before compression. Data is always stored as compressed format in SF.

How well did you know this?

Not at all

Perfectly

What is query pruning?

It is simply a concept where unnecessary partitions are avoided to be scanned for that query. It is based on the metadata collected while creating micro partitions.

How well did you know this?

Not at all

Perfectly

What metadata is collected on micropartitions?

Range of values.
distinct values.
additional properties that SF collects for query performance which is not revealed

How well did you know this?

Not at all

Perfectly

Micro partitions are collected by users? Can it be changed by users?

It is not created by users. SF automatically does it as data is coming in. It cannot be changed by users.
Users can do clustering from their end which will improve performance.

How well did you know this?

Not at all

Perfectly

What is clustering depth?

Once micro partitions are created, SF also collects metadata on how many micro partitions overlap… This overlapping is called clustering depth. Lower the depth it is good.

How well did you know this?

Not at all

Perfectly

What is the clustering depth of empty table?

How well did you know this?

Not at all

Perfectly

What are the disadvantages of static partitioning done in traditional databases which Microparitioning takes care of ?

maintenance (as data gets updated, static partitioning can go outdated)
Data skew if partition columns are not uniformly distributed.

How well did you know this?

Not at all

Perfectly

Is micro partitions logical storage or physical storage?

They are physical storage - how tables are physically stored at the back end - data is stored in columnar format.

How well did you know this?

Not at all

Perfectly

When do you need clustering/setting up cluster keys?

When clustering depth is large on micropartitions, (because ordering at the time of the insert was not perfect)
when lot of DMLs has happened on the tables and existing micropartitions are not causing efficient query pruning etc.

How well did you know this?

Not at all

Perfectly

Clustering can be done on what object types?

Tables and materialized views

How well did you know this?

Not at all

Perfectly

Clustering is recommended for what tables?

large tables where queries are running slower.
large tables where DML is happening frequently.
generally tables in Terabyte range

How well did you know this?

Not at all

Perfectly

How do you choose columns for clustering?

Clustering will create new micropartitions (MPs) so it will add to storage cost so you have to be careful about how many columns and what columns you choose.

3 to 4 columns is ok.
Choose columns used in filters and then used in joins.

Choose columns

that have lot of distinct values so many MPs can be ignored using query pruning
as well as it should not be so unique that MPs cannot group them.

How well did you know this?

Not at all

Perfectly

What columns should not be chosen for clustering?

Extremely high cardinal (most unique values) like id columns should not be chosen….because it cannot be grouped together for creating a cluster.

Extremely low cardinal columns (less unique values) like gender should not be chosen as well….because query pruning cannot be done efficiently.

For high cardinal columns like timestamp, you can do expressions like to_date conversion to do efficient clustering.

How well did you know this?

Not at all

Perfectly

When choosing multiple columns for clustering - what order to be followed?

Study These Flashcards

Choose from lowest cardinality to highest.

What is Reclustering?

Study These Flashcards

when lot of DMLs has happened on clustered tables over time, SF automatically does Reclustering to keep clustering/query performance efficient.

Does reclustering end up in additional costs?

Study These Flashcards

Yes, because it will create new MPs. Old MPs will be reserved for time travel and fail safe.

Can you do expressions on clustering key columns?

Study These Flashcards

Yes, some limited expressions like timestamp to date conversions can be done.

Does changing the clustering key using ALTER impact anything to the table immediately?

Study These Flashcards

There will be no impact unless there is another reclustering.

When does SF does reclustering?

Study These Flashcards

When SF determines there is a benefit in doing this….it is doing automatically.
No need to monitor clustered table current state by users.
No need for users to designate any WHs for reclustering.
Everything is done by SF.

What are benefits of automated reclustering from SF?

Study These Flashcards

Users need not allocate resources/WHs for it. SF does it dynamically when it sees the benefit.
Users need not monitor the current state.
SF gives full control in the sense - users can even suspend reclustering OR even remove the cluster key.
It does not block DML operations while reclustering is done.

Do you need to provide virtual WH for reclustering?

Study These Flashcards

No need…as it is done by SF automatically dynamically.

What privileges are needed to add clustering to the table?

Study These Flashcards

USAGE/OWNERSHIP privileges on the database and schema of that table.

Is automated reclustering enabled by default?

Yes, once you define tables with clustering keys it is enabled by default. You can suspend it using ALTER command when it is not required. You can also resume it later.

Where do you see bills for AUTOMATIC CLUSTERING?

As an account admin, go to Account>Usage.... See service called AUTOMATIC_CLUSTERING. Also you can see AUTOMATIC_CLUSTERING_HISTORY table in info schema AUTOMATIC_CLUSTERING_HISTORY view in Account usage schema

Can resource monitor control the WH used for automated reclustering?

As it is done by SF directly, resource monitor cannot control that.

Is manual reclustering available?

No this option was removed now.

Is clustering depth the best way to check if table is well clustered?

Nope. Query performance is the best way. If query is performing badly, then it definitely says table is not well clustered. Clustering depth is just another additional metric to check.

What are the system functions available to view or monitor clustering metadata?

SYSTEM$CLUSTERING_DEPTH SYSTEM$CLUSTERING_INFORMATION (including clustering depth)

Micro partitions and Clustering Flashcards

(30 cards)