Snowflake - Udemy - exam 1 Flashcards

1
Q

Monica ran a SELECT query on a large table t1. The query took longer than expected. She looked into the query profile and found that ‘ Bytes spilled to local storage’ and ‘Bytes spilled to remote storage’ are very high. What advice will you give to her to improve the query performance? (Select 3)

A
  1. Using a larger warehouse (effectively increasing the available memory/local disk space for the operation.
  2. Increasing the number of parallel queries running in the warehouse.
  3. Processing data in smaller batches.
    ??. Trying to split processing into several steps

Explanation
When Snowflake warehouse cannot fit an operation in memory, it starts spilling (storing) data first to the local disk of a warehouse node, and then to remote storage. In such a case, Snowflake first tries to temporarily store the data on the warehouse’s local disk. As this means extra IO operations, any query that requires spilling will take longer than a similar query running on similar data that is capable to fit the operations in memory. Also, if the local disk is not sufficient to fit the spilled data, Snowflake further tries to write to the remote cloud storage, which will be shown in the query profile as “Bytes spilled to remote storage”.

The spilling can’t always be avoided, especially for large batches of data, but it can be decreased by: Reducing the amount of data processed.

For example, by trying to improve partition pruning, or projecting only the columns that are needed in the output. Decreasing the number of parallel queries running in the warehouse. Trying to split the processing into several steps (for example by replacing the CTEs with temporary tables). Using a larger warehouse - effectively means more memory and more local disk space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Both external (external cloud storage) and internal (i.e., Snowflake) stages support unstructured data. (True / False)

A

True, both external (external cloud storage, such as, Amazon S3, Google Cloud Storage, Azure Blob Storage etc.) and internal (i.e. Snowflake) stages support unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of these are kind of Cache in Snowflake?

A

Metadata Cache
Query Result Cache
Data / Local Disk Cache
Snowflake has three types of cache.
1. The metadata cache that lives in the cloud services layer.
2. The data cache/local disk cache that lives on the SSD drives in the virtual warehouses, and
3. The query result cache. If a result is small, it will be stored in the cloud services layer, but larger results are going to be stored in the storage layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Time Travel can be disabled for an account by ACCOUNTADMIN. (True/False)

A

False
Time Travel cannot be disabled for an account. A user with the ACCOUNTADMIN role can set DATA_RETENTION_TIME_IN_DAYS to 0 at the account level, which means that all databases (and subsequently all schemas and tables) created in the account have no retention period by default; however, this default can be overridden at any time for any database, schema, or table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following languages does Snowflake support for writing UDFs (User-Defined Functions)?

A

JavaScript
Java
SQL
Python

Explanation
User-defined functions (UDFs) let you extend the system to perform operations that are not available through the built-in, system-defined functions provided by Snowflake. Snowflake currently supports the following languages for writing UDFs: Java: A Java UDF lets you use the Java programming language to manipulate data and return either scalar or tabular results. JavaScript: A JavaScript UDF lets you use the JavaScript programming language to manipulate data and return either scalar or tabular results. Python: A Python UDF lets you use the Python programming language to manipulate data and return either scalar or tabular results. SQL: A SQL UDF evaluates an arbitrary SQL expression and returns either scalar or tabular results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A user’s default role is

A

the role a user gets set to each time the user logs in to Snowflake.
Explanation
A user’s default role is the role a user gets set to each time the user logs in to Snowflake. Snowflake uses roles to control the objects (virtual warehouses, databases, tables, etc.) that users can access:

Snowflake provides a set of predefined roles, as well as a framework for defining a hierarchy of custom roles.

All Snowflake users are automatically assigned the predefined PUBLIC role, which enables login to Snowflake and basic object access.

In addition to the PUBLIC role, each user can be assigned additional roles, with one of these roles designated as their default role.

A user’s default role determines the role used in the Snowflake sessions initiated by the user; however, this is only a default. Users can change roles within a session at any time.

Roles can be assigned at user creation or afterward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which systems function can help find the overlap depth of a table’s micro-partitions?

A

SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_DEPTH
Explanation
For example, if you have an EMPLOYEE table - you can run any of these queries to find the depth - SELECT SYSTEM$CLUSTERING_INFORMATION(‘EMPLOYEE’); SELECT SYSTEM$CLUSTERING_DEPTH(‘EMPLOYEE’);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What value will be return by the following query? SELECT * FROM TABLE(FLATTEN(input => parse_json(‘[]’))) f;

A

nothing will return / output of the input row will be omitted
Explanation
If you don’t specify OUTER argument with FLATTEN, it would be defaulted to FALSE. The OUTER => FALSE argument with FLATTEN omits the output of the input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens to the data when the retention period ends for an object?

A

Data is moved to Snowflake Fail-safe.
Explanation
When the retention period ends for an object, the historical data is moved into Snowflake Fail-safe. Snowflake support needs to be contacted to get the data restored from Fail-safe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Readers accounts enable providers to share data with consumers who are not already Snowflake customers without requiring the consumers to become Snowflake Customers. Which role can create the Reader account?

A

ACCOUNTADMIN
Explanation
ACCOUNTADMIN role (or a role granted the CREATE ACCOUNT global privilege) only can create the Reader account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which command will list the pipes for which you have access privileges?

A

SHOW PIPES;
Explanation
SHOW PIPES Command lists the pipes for which you have access privileges. This command can list the pipes for a specified database or schema (or the current database/schema for the session), or your entire account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the Snowflake editions provides a federated authorization feature?

A

All of the Snowflake Editions.
All Snowflake Editions (Standard, Enterprise, Business Critical, Virtual Private Snowflake) provide Federated Authentication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of these SQL functions does Snowflake support?

A

User-Defined
Table
Window
Scalar
Aggregate
System

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

There are two modes to set up a multi-cluster warehouse. Select those from the given choices.

A

Auto-scaling mode
Maximized mode
There are two ways to set up a multi-cluster warehouse: in maximized mode, or auto-scaling mode.

Maximized mode - You simply set your minimum equal to your maximum, and those values are something greater than one.

Auto-Scaling mode - Specify different values for the maximum and the minimum number of clusters. In this mode, Snowflake starts and stops clusters as needed to dynamically manage the load on the warehouse:

As the number of concurrent user sessions and/or queries for the warehouse increases, and queries start to queue due to insufficient resources, Snowflake automatically starts additional clusters, up to the maximum number defined for the warehouse.

Similarly, as the load on the warehouse decreases, Snowflake automatically shuts down clusters to reduce the number of running clusters and, correspondingly, the number of credits used by the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Micro-partitioning is the on-demand feature of Snowflake. It is required to be enabled explicitly by ACCOUNTADMIN. (True / False)

A

FALSE
Micro-partitioning is automatically performed on all Snowflake tables. Tables are transparently partitioned using the Ordering of the data as inserted or loaded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

John is trying to load JSON data sets with a huge array containing multiple records. Considering the VARIANT data type imposed size of 16 MB, what do you recommend to John for optimally loading the data?

A

Enable the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command
If the data exceeds 16 MB, enable the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command to remove the outer array structure and load the records into separate table rows: copy into <table> from @~/<file>.json file_format = (type = 'JSON' strip_outer_array = true);</file>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Snowflake automatically and transparently maintains materialized views. (True/False)

A

TRUE
Snowflake automatically and transparently maintains materialized views. A background service updates the materialized view after changes to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which data does not fit into a predefined data model or schema?

A

Unstructured Data
Unstructured data is information that does not fit into a predefined data model or schema. Typically text-heavy, such as form responses and social media conversations, unstructured data encompasses images, video, and audio. Industry-specific file types such as VCF (genomics), KDF (semiconductors), or HDF5 (aeronautics) are included in this category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If we make any changes to the original table, then

A

The changes do not reflect in the cloned table.
Zero-copy cloning allows us to make a snapshot of any table, schema, or database without actually copying data. A clone is writable and is independent of its source (i.e., changes made to the source or clone are not reflected in the other object). A new clone of a table points to the original table’s micro partitions, using no data storage. If we make any changes in the cloned table, then only its changed micro partitions are written to storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

John has to create a PIPE that will be triggered for loading by calling the Snowpipe REST endpoints. What parameter does he need to specify in CREATE PIPE statement?

A

AUTO_INGEST = FALSE
AUTO_INGEST = TRUE enables automatic data loading. Snowpipe supports loading from external stages (Amazon S3, Google Cloud Storage, or Microsoft Azure). AUTO_INGEST = FALSE disables automatic data loading. You must make calls to the Snowpipe REST API endpoints to load data files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What would happen if we suspend the warehouse while it is executing the SQL statement?

A

Only idle compute resources of the warehouse will be shut down, allowing any compute resources executing statements to continue until the statement is complete.
When we suspend a warehouse, Snowflake immediately shuts down all idle compute resources for the warehouse. However, it allows any compute resources executing statements to continue until the statements are complete. At this time, the resources are shut down, and the warehouse status changes to “Suspended”. Compute resources waiting to shut down are considered to be in “quiesce” mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which services are managed by Snowflake’s cloud services layer? (Select all that apply)

A

Infrastructure Management
Authentication
Metadata Management
Query Processing and Optimization
Access Control

The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch.
The cloud service layer manages Authentication, Infrastructure Management, Metadata Management, Query parsing and optimization, and Access control services.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

File URL is ideal for:

A

custom application that require access to unstructured data files
File URL: URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files. Ideal for custom applications that require access to unstructured data files.
Scoped URL: Encoded URL that permits temporary access to a staged file without granting privileges to the stage. The URL expires when the persisted query result period ends (i.e., the results cache expires), which is currently 24 hours. Ideal for use in custom applications, providing unstructured data to other accounts via a share, or for downloading and ad hoc analysis of unstructured data via Snowsight.
Pre-signed URL: Simple HTTPS URL used to access a file via a web browser. A file is temporarily accessible to users via this URL using a pre-signed access token. The expiration time for the access token is configurable. Ideal for business intelligence applications or reporting tools that need to display unstructured file contents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Snowflake Query history page allows you to view the details of all the queries executed in the last 31 days. (T/F)

A

False
Snowflake Query history page allows you to view the details of all the queries executed in the last 14 days. You can query the Query_History view in Snowflake’s Account Usage schema for older queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which of these types of VIEW does Snowflake support? (Select 3)

A

Standard View
Secure View
Materialized View
Snowflake supports three types of views.

Standard View, Secure View, and Materialized View.

Standard View: It is a default view type. Its underlying DDL is available to any role with access to the view. When you create a standard view, Snowflake saves a definition of the view. Snowflake does not run the query. When someone accesses the view, that is when the query is run. The standard view will always execute as the owning role.

Secure View: The secure view is exactly like a standard view, except users cannot see how that view was defined. Sometimes a secure view will run a little slower than a standard view to protect the information in a secure view. Snowflake may bypass some of the optimizations.

Materialized View: A materialized view is more like a table. Unlike a standard or secure view, Snowflake runs the query right away when you create a materialized view. It takes the results set and stores that result set as a table in Snowflake. Because Snowflake is storing that materialized view as a table, creating micro partitions. Snowflake is creating metadata about those micro partitions. So when you query a materialized view, if you put a filter on the view, you get the same benefit of micro partition pruning that you would get from a table. With Snowflake, the materialized view is automatically refreshed every time there is a transaction against the base table. So it is always going to be in sync. If you want, you can also create a secure materialized view, which again will hide the logic from the user. A note about materialized views, because Snowflake is auto-refreshing them in the background, they use some credits, so there is a little bit of a cost there. Moreover, there is some storage, and Snowflake stores the result set as a table in Snowflake. So materialized views use more storage and compute than standard or secure views.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Suppose you have an auto-scaling mode setup with an Economy policy. In what situation does Snowflake spin up an additional cluster?

A

Only if the system estimate there’s enough query load to keep the cluster busy for at least 6 minutes.
In the Economy Scaling policy, Snowflake spins up an additional cluster only if the system estimates there’s enough query load to keep the cluster busy for a least 6 minutes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which systems function can help find the overlap depth of a table’s micro-partitions?

A

SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_DEPTH
For example, if you have an EMPLOYEE table - you can run any of these queries to find the depth - SELECT SYSTEM$CLUSTERING_INFORMATION(‘EMPLOYEE’); SELECT SYSTEM$CLUSTERING_DEPTH(‘EMPLOYEE’);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Which of these functions helps generate the FILE URL to access the unstructured data file?

A

BUILD_STAGE_FILE_URL
BUILD_STAGE_FILE_URL generates a Snowflake-hosted file URL to a staged file using the stage name and relative file path as inputs. A file URL permits prolonged access to a specified file. That is, the file URL does not expire. File URL: URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Select the type of function that can operate on a subset of rows within the set of input rows.

A

Window Function
A window function is any function that operates over a window of rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Which of these compression methods are supported by Snowflake? (Select all that apply)

A

Brotli
raw_deflate
deflate
gzip
Zstandard
bzip2
All of these are supported by Snowflake. Snowflake can automatically detect any of these compression methods except Brotli and Zstandard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Monica wants to delete all the data from table t1. She wants to keep the table structure, so she does not need to create the table again. Which command will be appropriate for her need?

A

Truncate
TRUNCATE will delete all of the data from a single table. So, once Monica truncates table t1, table t1’s structure remains, but the data will be deleted. DELETE is usually used for deleting single rows of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How can we add a Directory table explicitly to a stage to store a catalog of staged files?

A

Using Create Stage command
A Directory table is not a separate database object; it stores a catalog of staged files in cloud storage. Roles with sufficient privileges can query a directory table to retrieve file URLs to access the staged files and other metadata. A directory table can be added explicitly to a stage when the stage is created (using CREATE STAGE) or later (using ALTER STAGE) with supplying directoryTableParams. directoryTableParams (for internal stages) ::= [ DIRECTORY = ( ENABLE = { TRUE | FALSE } [ REFRESH_ON_CREATE = { TRUE | FALSE } ] ) ] ENABLE = TRUE | FALSE Specifies whether to add a directory table to the stage. When the value is TRUE, a directory table is created with the stage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Only the user who generated the scoped URL can use the URL to access the referenced file. (True/False)

A

True
True, only the user who generated the scoped URL can use the URL to access the referenced file. I case of File URL, any role that has sufficient privileges on the stage can access the file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

An account-level resource monitor overrides the resource monitor assignment for individual warehouses. (True/False)

A

False
An account-level resource monitor does not override resource monitor assignments for individual warehouses. If either the account resource monitor or the warehouse resource monitor reaches its defined threshold and a suspend action has been defined, the warehouse is suspended.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Which of these system-defined roles can manage operations at the organization level?

A

ORGADMIN
ORGADMIN role manages operations at the organizational level. More specifically, this role:
Can create accounts in the organization.
Can view all accounts in the organization (using SHOW ORGANIZATION ACCOUNTS) and all regions enabled for the organization (using SHOW REGIONS).
Can view usage information across the organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

If you create a Network Policy by providing both ‘Allowed IP Addresses’ and ‘Blocked IP Addresses’, which is applied first by Snowflake while validating the access?

A

Blocked IP Addresses
If you provide both Allowed IP Addresses and Blocked IP Addresses, Snowflake applies the Blocked List first.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake. (True/False)

A

True
Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

The user access history can be found by querying the

A

Account Usage ACCESS_HISTORY view
Access History in Snowflake refers to when the user query reads column data and when the SQL statement performs a data write operation, such as INSERT, UPDATE, and DELETE, along with variations of the COPY command, from the source data object to the target data object. The user access history can be found by querying the Account Usage ACCESS_HISTORY view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

If you drop or disable a user in Snowflake in an Okta IdP federated environment, the user can still access Snowflake login through Okta. (True/False)

A

False
Users who are dropped or disabled in Snowflake are still able to log into their Okta accounts, but they will receive an error message when they attempt to connect to Snowflake. You must recreate or enable the user before they can log in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What all options are available for data transformation while loading data into a table using the COPY command? (Select that all apply)

A

Column omission
Casts
Truncation of Text Strings
Column Reordering
Snowflake supports transforming data while loading it into a table using the COPY command. Options include:
Column reordering
Column omission
Casts
Truncating text strings that exceed the target column length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

The VALIDATION_MODE parameter does not support COPY statements that transform data during a load. (True / False)

A

True
Explanation
True.
VALIDATION_MODE instructs the COPY command to validate the data files instead of loading them into the specified table; i.e., the COPY command tests the files for errors but does not load them.
The command validates the data to be loaded and returns results based on the validation option specified:
Syntax: VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS
RETURN_n_ROWS (e.g. RETURN_10_ROWS) - Validates the specified number of rows, if no errors are encountered; otherwise, fails at the first error encountered in the rows.
RETURN_ERRORS - Returns all errors (parsing, conversion, etc.) across all files specified in the COPY statement.
RETURN_ALL_ERRORS - Returns all errors across all files specified in the COPY statement, including files with errors that were partially loaded during an earlier load because the ON_ERROR copy option was set to CONTINUE during the load.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How can we turn off the query result cache?

A

Setting the parameter USE_CACHED_RESULT to FALSE
We can turn off the query result cache by setting the parameter USE_CACHED_RESULT to FALSE. Though the only reason we would really want to do this is if we are doing performance testing.

43
Q

What all locations do Snowflake support for staging the data?

A

Amazon S3
Snowflake Internal Storage
Google Cloud Storage
Microsoft Azure Blob Storage
Snowflake supports loading data from files staged in any of the following locations, regardless of the cloud platform for your Snowflake account: Internal (i.e. Snowflake) stages, Amazon S3, Google Cloud Storage, Microsoft Azure blob storage

44
Q

Snowflake is available in four editions. Which are those? (Select 4)

A

Standard
Enterprise
Business Critical
Virtual Private Snowflake
Snowflake is available in four editions: Standard, Enterprise, Business Critical, and Virtual Private Snowflake (VPS). Standard comes with most of the available features. Enterprise adds on to Standard with things like: extra days of time travel, materialized view support, and data masking. Business Critical brings to the table: HIPAA support, Tri-secret Secure, and more. And Virtual Private Snowflake is everything that Business Critical has, but with the ability to have customer-dedicated metadata stores and customer-dedicated virtual service.

45
Q

Which of these are not supported by the Search Optimization Service? (Select all that apply)

A

Materialized Views
Analytical Expression
Casts on table columns
Columns defined with COLLATE clause
External Tables
Column Concatenation
None of these are currently supported by the Search Optimization Service. Additionally, Tables and views protected by row access policies cannot be used with the Search Optimization Search.

46
Q

What authentication methods does Snowflake support for REST API authentication?

A

OAuth
Key Pair Authentication
Snowflake SQL API supports Oauth, and Key Pair authentication.

47
Q

What is the expiration period of a File URL?

A

It is Permanent
The expiration period of Scoped URL: The URL expires when the persisted query result period ends.
The expiration period of the File URL: It is permanent.
The expiration period of Pre-Signed URL: Length of time specified in the expiration_time argument.

48
Q

You have a table with a 30-day retention period. If you decrease the data period to 20 days, how would it affect the data that would have been removed after 30 days?

A

The data will now retain for a shorter period of 20 days.
Decreasing Retention reduces the amount of time data is retained in Time Travel:

For active data modified after the retention period is reduced, the new shorter period applies.
For data that is currently in Time Travel:
If the data is still within the new shorter period, it remains in Time Travel.
If the data is outside the new period, it moves into Fail-safe.

For example, if you have a table with a 30-day retention period and you decrease the period to 20-day, data from days 21 to 30 will be moved into Fail-safe, leaving only the data from day 1 to 20 accessible through Time Travel. However, the process of moving the data from Time Travel into Fail-safe is performed by a background process, so the change is not immediately visible. Snowflake guarantees that the data will be moved, but does not specify when the process will complete; until the background process completes, the data is still accessible through Time Travel.

49
Q

Which algorithm does Snowflake use to estimate the approximate number of distinct values in a data set?

A

HyperLogLog
Snowflake uses HyperLogLog to estimate the approximate number of distinct values in a data set. HyperLogLog is a state-of-the-art cardinality estimation algorithm, capable of estimating distinct cardinalities of trillions of rows with an average relative error of a few percent.

50
Q

The Snowflake Information Schema includes table functions you can query to retrieve information about your directory tables. Which table function can be used to query the history of data files registered in the metadata of specified objects and the credits billed for these operations?

A

AUTO_REFRESH_REGISTRATION_HISTORY
AUTO_REFRESH_REGISTRATION_HISTORY table function can be used to query the history of data files registered in the metadata of specified objects and the credits billed for these operations. The table function returns the billing history within a specified date range for your entire Snowflake account. This function returns billing activity within the last 14 days.
Please note, STAGE_DIRECTORY_FILE_REGISTRATION_HISTORY table function can be used to query information about the metadata history for a directory table, including:
Files added or removed automatically as part of a metadata refresh.

Any errors found when refreshing the metadata.

51
Q

In which of the cloud platforms a Snowflake account can be hosted? (Select 3)

A

Azure
GCP
AWS
A Snowflake account can be hosted on any of the following cloud platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure (Azure). On each platform, Snowflake provides one or more regions where the account is provisioned.

52
Q

Which is not the DML (Data Manipulation Language) command?

A

UNDROP
UNDROP is Snowflake’s DDL (Data Definition Language) command.

53
Q

Permissions on database objects such as databases or tables are granted to:

A

Roles
Snowflake supports Role-Based Access control. Permissions on database objects such as databases or tables are granted to Roles.

54
Q

In what situations should you consider User-Managed Tasks over Serverless Tasks? (Select 2)

A

Consider when you can fully utilize a single warehouse by scheduling multiple concurrent tasks to take advantage of available compute resources.
Consider when adherence to the schedule interval is less important.
User-managed Tasks is recommended when you can fully utilize a single warehouse by scheduling multiple concurrent tasks to take advantage of available compute resources. Also, recommended when adherence to the schedule interval is less critical. Serverless Tasks is recommended when you cannot fully utilize a warehouse because too few tasks run concurrently or they run to completion quickly (in less than 1 minute). Also, recommended when adherence to the schedule interval is critical.

55
Q

While transforming Semi-structure data, If you want expansion for all the sub-elements recursively using FLATTEN function, what argument would you need to set with FLATTEN function?

A

RECURSIVE => TRUE
The expansion is performed for all sub-elements recursively by argument RECURSIVE => TRUE. Only the element referenced by PATH is expanded BY RECURSIVE => FALSE. The OUTER argument is used to handle the input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries.

56
Q

Dynamic Data Masking is supported by:

A

Enterprise Edition
Business Critical
VPS
Dynamic Data Masking features require Enterprise Edition (or higher).

57
Q

If a user is logged in to Snowflake in a federated environment and IdP times out, what does happen to the user’s snowflake session?

A

It does not affect the user’s Snowflake sessions. However, to initiate any new Snowflake sessions, the user must log into the IdP again.
After a specified period of time (defined by the IdP), a user’s session in the IdP automatically times out, but this does not affect their Snowflake sessions. Any Snowflake sessions that are active at the time remain open and do not require re-authentication. However, to initiate any new Snowflake sessions, the user must log into the IdP again.

58
Q

How long do results remain in the Query results cache?

A

24h
Results are retained for 24 hours in Query Result Cache. Snowflake resets the 24-hour retention period for the result, up to a maximum of 31 days from the date and time that the query was first executed. After 31 days, the result is purged and the next time the query is submitted, a new result is generated and persisted.

59
Q

Select the correct statements for Table Clustering. (Select 3)

A
  1. Snowflake recommends a maximum of three or four columns (or expressions) per key
  2. Clustering keys are not for every table.
  3. Tables in multi-terabytes range are a good candidates for clustering keys.
    Clustering keys are not for every table. Tables in the multi-terabyte range are good candidates for clustering keys. Both automatic clustering and reclustering consume credit. A single clustering key can contain one or more columns or expressions. Snowflake recommends a maximum of three or four columns (or expressions) per key for most tables. Adding more than 3-4 columns tends to increase costs more than benefits.
60
Q

Search optimization is a Database-level property applied to all the tables within the database with supported data types. (True/False)

A

False
Search optimization is a table-level property and applies to all columns with supported data types. The search optimization service aims to significantly improve the performance of selective point lookup queries on tables. A point lookup query returns only one or a small number of distinct rows. A user can register one or more tables to the search optimization service.

61
Q

Which view in the Account Usage Schema can be used to query the replication history for a specified database?

A

REPLICATION_USAGE_HISTORY
This REPLICATION_USAGE_HISTORY view in the Account Usage Schema can be used to query the replication history for a specified database. The returned results include the database name, credits consumed, and bytes transferred for replication. Usage data is retained for 365 days (1 year).

62
Q

Which stream type is supported for streams on the external table only?

A

Insert-only
Insert-only is supported for streams on external tables only. An insert-only stream tracks row inserts only; they do not record delete operations that remove rows from an inserted set (i.e. no-ops).

63
Q

How can you unload the data from Snowflake using COPY INTO location statements in a Single file?

A

By specifying SINGLE - TRUE
To unload data to a single output file (at the potential cost of decreased performance), specify the SINGLE = TRUE copy option in your statement. You can optionally specify a name for the file in the path.

64
Q

At what frequency does Snowflake rotate the object keys?

A

30 Days
All Snowflake-managed keys are automatically rotated by Snowflake when they are more than 30 days old. Active keys are retired, and new keys are created. When Snowflake determines the retired key is no longer needed, the key is automatically destroyed. When active, a key is used to encrypt data and is available for usage by the customer. When retired, the key is used solely to decrypt data and is only available for accessing the data.

65
Q

Monica has successfully created a task with the 5 minutes schedule. It has been 30 minutes, but the task did not run. What could be the reason?

A

Monica should run ALTER TASK COMMAND to resume the task
The first time we create the TASK, we need to run the ALTER TASK command to RESUME the task.

66
Q

Which of these are Snowgrid’s capabilities? (Select all that apply)

A

Zero-copy cloning
Live, ready to query data
Share internally with private data exchange or externally with public data exchange
Snowgrid allows you to use Secure Data Sharing features to provide access to live data, without any ETL or movement of files across environments.

67
Q

Which of these roles is dedicated to user and role management only?

A

USERADMIN
USERADMIN role is dedicated to user and role management only. More specifically, this role:
Is granted the CREATE USER and CREATE ROLE security privileges.
Can create users and roles in the account.
This role can also manage users and roles that it owns.
Only the role with the OWNERSHIP privilege on an object (i.e. user or role), or a higher role, can modify the object properties.

68
Q

Both external (external cloud storage) and internal (i.e., Snowflake) stages support unstructured data. (True / False)

A

TRUE
True, both external (external cloud storage, such as, Amazon S3, Google Cloud Storage, Azure Blob Storage etc.) and internal (i.e. Snowflake) stages support unstructured data.

69
Q

For which object the Kafka connector does create a topic?

A

The connector creates the following objects for each topic:
One internal stage to temporarily store data files for each topic.
One pipe to ingest the data files for each topic partition.
One table for each topic. If the table specified for each topic does not exist, the connector creates it; otherwise, the connector creates the RECORD_CONTENT and RECORD_METADATA columns in the existing table and verifies that the other columns are nullable (and produces an error if they are not).

70
Q

How many maximum columns (or expressions) are recommended for a cluster key?

A

3 to 4
A single clustering key can contain one or more columns or expressions. Snowflake recommends a maximum of 3 or 4 columns (or expressions) per key for most tables. Adding more than 3-4 columns tends to increase costs more than benefits.

71
Q

If an account has federated authentication enabled. Can Snowflake admins still maintain user IDs and passwords in Snowflake?

A

Yes
With federated authentication enabled on an account, Snowflake still allows maintaining and using Snowflake user credentials (login name and password). In other words:
Account and security administrators can still create users with passwords maintained in Snowflake.
Users can still log into Snowflake using their Snowflake credentials.
However, if federated authentication is enabled for an account, Snowflake does not recommend maintaining user passwords in Snowflake. Instead, user passwords should be maintained solely in your IdP

72
Q

Which privileges are provided with a share by the provider? (Select 2)

A

Grant access (Select) to the specific tables in the database
Grant access (Usage) to the database and the schema containing the table to share
Shares are named Snowflake objects that encapsulate all of the information required to share a database. Each share consists of:
The privileges that grant access to the database(s) and the schema containing the objects to share.
The privileges that grant access to the specific objects in the database.
The consumer accounts with which the database and its objects are shared.
Example: CREATE SHARE “SHARED_DATA” COMMENT=’’; GRANT USAGE ON DATABASE “DEMO_DB” TO SHARE “SHARED_DATA”; GRANT USAGE ON SCHEMA “DEMO_DB”.”TWITTER_DATA” TO SHARE “SHARED_DATA”; GRANT SELECT ON VIEW “DEMO_DB”.”TWITTER_DATA”.”FOLLOWERS” TO SHARE “SHARED_DATA”;

73
Q

A stored procedure can simultaneously run the caller’s and the owner’s rights. (True / False)

A

False
A stored procedure runs with either the caller’s rights or the owner’s rights. It cannot run with both at the same time. A caller’s rights stored procedure runs with the privileges of the caller. The primary advantage of a caller’s rights stored procedure is that it can access information about that caller or about the caller’s current session. For example, a caller’s rights stored procedure can read the caller’s session variables and use them in a query. An owner’s rights stored procedure runs mostly with the privileges of the stored procedure’s owner. The primary advantage of an owner’s rights stored procedure is that the owner can delegate specific administrative tasks, such as cleaning up old data, to another role without granting that role more general privileges, such as privileges to delete all data from a specific table. At the time that the stored procedure is created, the creator specifies whether the procedure runs with owner’s rights or caller’s rights. The default is owner’s rights.

74
Q

select * from t1 sample row(100); What would the above query return?

A

Return an entire table, including all rows in the table. The sampling method is optional. If no method is applied after the sample keyword, the default it takes is BERNOULLI

75
Q

How can an ACCOUNTADMIN view the billing for Automatic Clustering? (Select all that apply)

A

Users with the ACCOUNTADMIN role can view the billing for Automatic Clustering using Snowsight, the classic web interface, or SQL: Snowsight: Select Admin » Usage. Classic Web Interface: Click on Account tab » Billing & Usage

The billing for Automatic Clustering shows up as a separate Snowflake-provided warehouse named AUTOMATIC_CLUSTERING.

SQL:Query either of the following: AUTOMATIC_CLUSTERING_HISTORY table function (in the Snowflake Information Schema). AUTOMATIC_CLUSTERING_HISTORY View (in Account Usage).

76
Q

Which database objects can be shared using the Snowflake Secure Data Sharing feature? (Select all that apply)

A

Secure Data Sharing enables sharing selected objects in a database in your account with other Snowflake accounts. The following Snowflake database objects can be shared:

Tables
External tables
Secure views
Secure materialized views
Secure UDFs
Snowflake enables the sharing of databases through shares created by data providers and “imported” by data consumers.

77
Q

You can create an an account level network policy using _____ (Select all that apply)

A

SQL
Classic Web UI
Snowsight
Only security administrators (i.e., users with the SECURITYADMIN role) or higher or a role with the global CREATE NETWORK POLICY privilege can create network policies using Snowsight, Classic Web Interface, and SQL.

78
Q

Which SQL command determines whether a network policy is set on the account or for a specific user?

A

SHOW PARAMETERS
The SHOW PARAMETERS command determines whether a network policy is set on the account or for a specific user.
For Account level: SHOW PARAMETERS LIKE ‘network_policy’ IN ACCOUNT;
For User level : SHOW PARAMETERS LIKE ‘network_policy’ IN USER <username>;
Example - SHOW PARAMETERS LIKE 'network_policy' IN USER john;</username>

79
Q

What happens to the data when the retention period ends for an object?

A

When the retention period ends for an object, the historical data is moved into Snowflake Fail-safe. Snowflake support needs to be contacted to get the data restored from Fail-safe.

80
Q

What will happen if a policy is assigned to a user who is already signed in?

A

If a policy is assigned to a user who already signed in, they can’t do anything else until they sign and signed back in again to make use of the new policy

81
Q

Which of these Snowflake Editions automatically stores data in an encrypted state?

A

All of the Snowflake Editions (Standard, Enterprise, Business Critical, Virtual Private Snowflake) automatically store data in an encrypted state.

82
Q

The suspended warehouse cannot be resized until they resume. (True / False)

A

False
The suspended warehouse can be easily resized. Resizing a suspended warehouse does not provision any new compute resources for the warehouse. It simply instructs Snowflake to provision the additional compute resources when the warehouse is next resumed, at which time all the usage and credit rules associated with starting a warehouse apply.

83
Q

A user can be assigned multiple roles. (True / False)

A

True
Roles are the entities to which privileges on securable objects can be granted and revoked. Roles are assigned to users to allow them to perform actions required for business functions in their organization. A user can be assigned multiple roles. It allows users to switch roles (i.e., choose which role is active in the current Snowflake session) to perform different actions using separate sets of privileges.

84
Q

Which of the following languages does Snowflake support for writing UDFs (User-Defined Functions)?

A

Java
Python
SQL
JavaScript
User-defined functions (UDFs) let you extend the system to perform operations that are not available through the built-in, system-defined functions provided by Snowflake. Snowflake currently supports the following languages for writing UDFs: Java: A Java UDF lets you use the Java programming language to manipulate data and return either scalar or tabular results. JavaScript: A JavaScript UDF lets you use the JavaScript programming language to manipulate data and return either scalar or tabular results. Python: A Python UDF lets you use the Python programming language to manipulate data and return either scalar or tabular results. SQL: A SQL UDF evaluates an arbitrary SQL expression and returns either scalar or tabular results.

85
Q

Which object parameter can users with the ACCOUNTADMIN role use to set the default retention period for their account?

A

Users can use the DATA_RETENTION_TIME_IN_DAYS object parameter with the ACCOUNTADMIN role to set the default retention period for their account

86
Q

Multi-cluster warehouses are beneficial in improving the performance of slow-running queries or data loading. (True/False)

A

Multi-cluster warehouses are best utilized for scaling resources to improve concurrency for users/queries. They are not as beneficial for improving the performance of slow-running queries or data loading. For these types of operations, resizing the warehouse provides more benefits.

87
Q

Which primary tool loads data to Snowflake from a local file system?

A

SnowSQL is the primary tool used to load data to Snowflake from a local file system. You can run it in either interactive shell or batch mode.

88
Q

What would you create (UDF or Stored procedure) if you need a function that can be called as part of a SQL statement and must return a value that will be used in the statement?

A

UDF
A UDF evaluates to a value and can be used in contexts in which a general expression can be used (e.g. SELECT my_function() …).
A stored procedure does not evaluate to a value, and cannot be used in all contexts in which a general expression can be used. For example, you cannot execute SELECT my_stored_procedure()….

89
Q

David ran a query that took around 30 mins to complete. He referred to the Query profiler and found the ‘Bytes spilled to local storage’ has a big number. What could be the issue?

A

Dawid is using a comparatively smaller warehouse.
If a node has insufficient memory to complete its portion of a query, it will “spill” to local SSD storage. This can negatively impact performance but is sometimes acceptable. If a node has insufficient local SSD storage to complete its portion of a query, it will “spill” to remote cloud storage. This is almost always very bad for performance. The solution, in either case, is to simplify the SQL query or increase the warehouse size.

90
Q

What value will be returned by the following query?
SELECT * FROM TABLE(FLATTEN(input => parse_json(‘[]’))) f;

A

nothing will return / output of the input row will be omitted
If you don’t specify OUTER argument with FLATTEN, it would be defaulted to FALSE. The OUTER => FALSE argument with FLATTEN omits the output of the input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries.

91
Q

Which is generally the slowest option for selecting staged data files to load from a stage?

A

Using pattern matching to identify specific files by pattern
Pattern matching using a regular expression is generally the slowest of the three options for identifying/specifying data files to load from a stage; however, this option works well if you exported your files in named order from your external application and want to batch load the files in the same order.

92
Q

A task can execute any one of the following types of SQL code: (Select 3)

A

A task can execute any one of the following types of SQL code:

Single SQL statement

Call to a stored procedure

Procedural logic using Snowflake Scripting.

93
Q

Which of these SQL functions helps returns the absolute path of a staged file using the stage name and path of the file relative to its location in the stage as inputs.?

A

GET_ABSOLUTE_PATH returns the absolute path of a staged file using the stage name and path of the file relative to its location in the stage as inputs.

94
Q

Which objects are not available for replication in the Standard Edition of Snowflake? (Select 3)

A

Users
Roles
Integrations
Database and share replication are available in all editions, including the Standard edition. Replication of all other objects is only available for Business Critical Edition (or higher).

95
Q

Which copyOptions can help load a file with expired metadata (if the LAST_MODIFIED date is older than 64 days and the initial set of data was loaded into the table more than 64 days earlier (and if the file was loaded into the table, that also occurred more than 64 days earlier))? (Select 2)

A

To load files whose metadata has expired, set the LOAD_UNCERTAIN_FILES copy option to true. The copy option references load metadata, if available, to avoid data duplication, but also attempts to load files with expired load metadata. Alternatively, set the FORCE option to load all files, ignoring load metadata if it exists. Note that this option reloads files, potentially duplicating data in a table.

96
Q

The major benefits of defining Clustering: (Select 2)

A

Defining clustering keys for very large tables (in the multi-terabyte range) helps optimize table maintenance and query performance. Small tables are not a good candidate for clustering.

97
Q

Which of the following file format is not supported by Snowflake?

A

Snowflake supports - CSV, TSV, JSON, AVRO, ORC, PARQUET. Snowflake also supports XML which is a Preview feature as of now. EDI format is not supported by Snowflake.

98
Q

When deciding whether to suspend a warehouse or leave it running, what should you consider?

A

Consider the trade-off between saving credits by suspending a warehouse versus maintaining the cache of data from previous queries to help with performance.

99
Q

A user cannot view the result set from a query that another user executed except for the ACCOUNTADMIN role. (True / False)

A

False
A user cannot view the result set from a query that another user executed. This behavior is intentional. For security reasons, only the user who executed a query can access the query results. This behavior is not connected to the Snowflake access control model for objects. Even a user with the ACCOUNTADMIN role cannot view the results for a query run by another user.

100
Q

Snowflake blocks certain IPs by default to ensure that customer is getting the highest level of Network security. (TRUE / FALSE)

A

False
By default, Snowflake allows users to connect to the service from any computer or device IP address. A security administrator (or higher) can create a network policy to allow or deny access to a single IP address or a list of addresses.

101
Q

If you recreate a pipe using CREATE OR REPLACE PIPE command. What does happen to load history if the Snowpipe gets recreated?

A

When you recreate a pipe, if you do CREATE OR REPLACE PIPE, that load history is reset to empty, so Snowflake doesn’t know which files we’ve already loaded.

102
Q

What is the purpose of VALIDATION_MODE in the COPY INTO <table> command?

A

VALIDATION_MODE instructs the COPY command to validate the data files instead of loading them into the specified table; i.e., the COPY command tests the files for errors but does not load them. The command validates the data to be loaded and returns results based on the validation option specified: Syntax : VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS RETURN_n_ROWS (e.g. RETURN_10_ROWS) - Validates the specified number of rows, if no errors are encountered; otherwise, fails at the first error encountered in the rows. RETURN_ERRORS - Returns all errors (parsing, conversion, etc.) across all files specified in the COPY statement. RETURN_ALL_ERRORS - Returns all errors across all files specified in the COPY statement, including files with errors that were partially loaded during an earlier load because the ON_ERROR copy option was set to CONTINUE during the load.

103
Q

Which of these Snowflake features does enable accessing historical data (i.e., data that has been changed or deleted) at any point within a defined period?

A

Snowflake Time Travel enables accessing historical data (i.e. data that has been changed or deleted) at any point within a defined period. It serves as a powerful tool for performing the following tasks:

Restoring data-related objects (tables, schemas, and databases) that might have been accidentally or intentionally deleted. - Duplicating and backing up data from key points in the past.

Analyzing data usage/manipulation over specified periods of time.