Snowflake - Udemy - exam 1 Flashcards
(103 cards)
Monica ran a SELECT query on a large table t1. The query took longer than expected. She looked into the query profile and found that ‘ Bytes spilled to local storage’ and ‘Bytes spilled to remote storage’ are very high. What advice will you give to her to improve the query performance? (Select 3)
- Using a larger warehouse (effectively increasing the available memory/local disk space for the operation.
- Increasing the number of parallel queries running in the warehouse.
- Processing data in smaller batches.
??. Trying to split processing into several steps
Explanation
When Snowflake warehouse cannot fit an operation in memory, it starts spilling (storing) data first to the local disk of a warehouse node, and then to remote storage. In such a case, Snowflake first tries to temporarily store the data on the warehouse’s local disk. As this means extra IO operations, any query that requires spilling will take longer than a similar query running on similar data that is capable to fit the operations in memory. Also, if the local disk is not sufficient to fit the spilled data, Snowflake further tries to write to the remote cloud storage, which will be shown in the query profile as “Bytes spilled to remote storage”.
The spilling can’t always be avoided, especially for large batches of data, but it can be decreased by: Reducing the amount of data processed.
For example, by trying to improve partition pruning, or projecting only the columns that are needed in the output. Decreasing the number of parallel queries running in the warehouse. Trying to split the processing into several steps (for example by replacing the CTEs with temporary tables). Using a larger warehouse - effectively means more memory and more local disk space.
Both external (external cloud storage) and internal (i.e., Snowflake) stages support unstructured data. (True / False)
True, both external (external cloud storage, such as, Amazon S3, Google Cloud Storage, Azure Blob Storage etc.) and internal (i.e. Snowflake) stages support unstructured data.
Which of these are kind of Cache in Snowflake?
Metadata Cache
Query Result Cache
Data / Local Disk Cache
Snowflake has three types of cache.
1. The metadata cache that lives in the cloud services layer.
2. The data cache/local disk cache that lives on the SSD drives in the virtual warehouses, and
3. The query result cache. If a result is small, it will be stored in the cloud services layer, but larger results are going to be stored in the storage layer.
Time Travel can be disabled for an account by ACCOUNTADMIN. (True/False)
False
Time Travel cannot be disabled for an account. A user with the ACCOUNTADMIN role can set DATA_RETENTION_TIME_IN_DAYS to 0 at the account level, which means that all databases (and subsequently all schemas and tables) created in the account have no retention period by default; however, this default can be overridden at any time for any database, schema, or table.
Which of the following languages does Snowflake support for writing UDFs (User-Defined Functions)?
JavaScript
Java
SQL
Python
Explanation
User-defined functions (UDFs) let you extend the system to perform operations that are not available through the built-in, system-defined functions provided by Snowflake. Snowflake currently supports the following languages for writing UDFs: Java: A Java UDF lets you use the Java programming language to manipulate data and return either scalar or tabular results. JavaScript: A JavaScript UDF lets you use the JavaScript programming language to manipulate data and return either scalar or tabular results. Python: A Python UDF lets you use the Python programming language to manipulate data and return either scalar or tabular results. SQL: A SQL UDF evaluates an arbitrary SQL expression and returns either scalar or tabular results.
A user’s default role is
the role a user gets set to each time the user logs in to Snowflake.
Explanation
A user’s default role is the role a user gets set to each time the user logs in to Snowflake. Snowflake uses roles to control the objects (virtual warehouses, databases, tables, etc.) that users can access:
Snowflake provides a set of predefined roles, as well as a framework for defining a hierarchy of custom roles.
All Snowflake users are automatically assigned the predefined PUBLIC role, which enables login to Snowflake and basic object access.
In addition to the PUBLIC role, each user can be assigned additional roles, with one of these roles designated as their default role.
A user’s default role determines the role used in the Snowflake sessions initiated by the user; however, this is only a default. Users can change roles within a session at any time.
Roles can be assigned at user creation or afterward.
Which systems function can help find the overlap depth of a table’s micro-partitions?
SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_DEPTH
Explanation
For example, if you have an EMPLOYEE table - you can run any of these queries to find the depth - SELECT SYSTEM$CLUSTERING_INFORMATION(‘EMPLOYEE’); SELECT SYSTEM$CLUSTERING_DEPTH(‘EMPLOYEE’);
What value will be return by the following query? SELECT * FROM TABLE(FLATTEN(input => parse_json(‘[]’))) f;
nothing will return / output of the input row will be omitted
Explanation
If you don’t specify OUTER argument with FLATTEN, it would be defaulted to FALSE. The OUTER => FALSE argument with FLATTEN omits the output of the input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries.
What happens to the data when the retention period ends for an object?
Data is moved to Snowflake Fail-safe.
Explanation
When the retention period ends for an object, the historical data is moved into Snowflake Fail-safe. Snowflake support needs to be contacted to get the data restored from Fail-safe.
Readers accounts enable providers to share data with consumers who are not already Snowflake customers without requiring the consumers to become Snowflake Customers. Which role can create the Reader account?
ACCOUNTADMIN
Explanation
ACCOUNTADMIN role (or a role granted the CREATE ACCOUNT global privilege) only can create the Reader account.
Which command will list the pipes for which you have access privileges?
SHOW PIPES;
Explanation
SHOW PIPES Command lists the pipes for which you have access privileges. This command can list the pipes for a specified database or schema (or the current database/schema for the session), or your entire account.
Which of the Snowflake editions provides a federated authorization feature?
All of the Snowflake Editions.
All Snowflake Editions (Standard, Enterprise, Business Critical, Virtual Private Snowflake) provide Federated Authentication.
Which of these SQL functions does Snowflake support?
User-Defined
Table
Window
Scalar
Aggregate
System
There are two modes to set up a multi-cluster warehouse. Select those from the given choices.
Auto-scaling mode
Maximized mode
There are two ways to set up a multi-cluster warehouse: in maximized mode, or auto-scaling mode.
Maximized mode - You simply set your minimum equal to your maximum, and those values are something greater than one.
Auto-Scaling mode - Specify different values for the maximum and the minimum number of clusters. In this mode, Snowflake starts and stops clusters as needed to dynamically manage the load on the warehouse:
As the number of concurrent user sessions and/or queries for the warehouse increases, and queries start to queue due to insufficient resources, Snowflake automatically starts additional clusters, up to the maximum number defined for the warehouse.
Similarly, as the load on the warehouse decreases, Snowflake automatically shuts down clusters to reduce the number of running clusters and, correspondingly, the number of credits used by the warehouse.
Micro-partitioning is the on-demand feature of Snowflake. It is required to be enabled explicitly by ACCOUNTADMIN. (True / False)
FALSE
Micro-partitioning is automatically performed on all Snowflake tables. Tables are transparently partitioned using the Ordering of the data as inserted or loaded.
John is trying to load JSON data sets with a huge array containing multiple records. Considering the VARIANT data type imposed size of 16 MB, what do you recommend to John for optimally loading the data?
Enable the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command
If the data exceeds 16 MB, enable the STRIP_OUTER_ARRAY file format option for the COPY INTO <table> command to remove the outer array structure and load the records into separate table rows: copy into <table> from @~/<file>.json file_format = (type = 'JSON' strip_outer_array = true);</file>
Snowflake automatically and transparently maintains materialized views. (True/False)
TRUE
Snowflake automatically and transparently maintains materialized views. A background service updates the materialized view after changes to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.
Which data does not fit into a predefined data model or schema?
Unstructured Data
Unstructured data is information that does not fit into a predefined data model or schema. Typically text-heavy, such as form responses and social media conversations, unstructured data encompasses images, video, and audio. Industry-specific file types such as VCF (genomics), KDF (semiconductors), or HDF5 (aeronautics) are included in this category.
If we make any changes to the original table, then
The changes do not reflect in the cloned table.
Zero-copy cloning allows us to make a snapshot of any table, schema, or database without actually copying data. A clone is writable and is independent of its source (i.e., changes made to the source or clone are not reflected in the other object). A new clone of a table points to the original table’s micro partitions, using no data storage. If we make any changes in the cloned table, then only its changed micro partitions are written to storage.
John has to create a PIPE that will be triggered for loading by calling the Snowpipe REST endpoints. What parameter does he need to specify in CREATE PIPE statement?
AUTO_INGEST = FALSE
AUTO_INGEST = TRUE enables automatic data loading. Snowpipe supports loading from external stages (Amazon S3, Google Cloud Storage, or Microsoft Azure). AUTO_INGEST = FALSE disables automatic data loading. You must make calls to the Snowpipe REST API endpoints to load data files.
What would happen if we suspend the warehouse while it is executing the SQL statement?
Only idle compute resources of the warehouse will be shut down, allowing any compute resources executing statements to continue until the statement is complete.
When we suspend a warehouse, Snowflake immediately shuts down all idle compute resources for the warehouse. However, it allows any compute resources executing statements to continue until the statements are complete. At this time, the resources are shut down, and the warehouse status changes to “Suspended”. Compute resources waiting to shut down are considered to be in “quiesce” mode.
Which services are managed by Snowflake’s cloud services layer? (Select all that apply)
Infrastructure Management
Authentication
Metadata Management
Query Processing and Optimization
Access Control
The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch.
The cloud service layer manages Authentication, Infrastructure Management, Metadata Management, Query parsing and optimization, and Access control services.
File URL is ideal for:
custom application that require access to unstructured data files
File URL: URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files. Ideal for custom applications that require access to unstructured data files.
Scoped URL: Encoded URL that permits temporary access to a staged file without granting privileges to the stage. The URL expires when the persisted query result period ends (i.e., the results cache expires), which is currently 24 hours. Ideal for use in custom applications, providing unstructured data to other accounts via a share, or for downloading and ad hoc analysis of unstructured data via Snowsight.
Pre-signed URL: Simple HTTPS URL used to access a file via a web browser. A file is temporarily accessible to users via this URL using a pre-signed access token. The expiration time for the access token is configurable. Ideal for business intelligence applications or reporting tools that need to display unstructured file contents.
Snowflake Query history page allows you to view the details of all the queries executed in the last 31 days. (T/F)
False
Snowflake Query history page allows you to view the details of all the queries executed in the last 14 days. You can query the Query_History view in Snowflake’s Account Usage schema for older queries.