cert Flashcards

1
Q

Snowcd VS Snowsql

A

Snowcd is diag & troubleshoot their network
VS
Snowsql is a command line client for connecting to snowflake
Note: snowsql.conf file can be modified to customize options.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

UDF VS SP

A

Within SP you can execute DB operations
UDFS you cannot access DB
Multiple UDFs may be called in a statement
only one with SP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how long does snowpipe keep load history

A

14 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

who can create/alter/drop network policy

A

security admin & accountadmin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is there a hard limit on number of shares

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a share

A

Named DB objects that contains all the information required to share a DB like the object grants & consumer accounts details.
-tables -external tables -secure views -secure mvs -secure UDFs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When do you scale out vs scale up

A

You scale out when you have concurrent queries (MCW)
VS
You scale up when you have one large query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

remote disk vs local disk

A

if snowflake cant fit operation in memory it spills to local storage then remote storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DAC vs RBAC

A

DAC each object has an owner who can in turn grant privilege access to that object.
VS
RBAC access privilege is assigned to roles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

orgadmin vs accountadmin

A

orgadmin manages operations at the org level.
accountadmin, sysadmin, securityadmin are system defined roles. Top level role.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

snowsight activity

A

snowsight is an upgrade from the classic console:
activity: query history, copy history, task history

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is snowpipe

A

-service that makes loading files process automated
-example load from s3 to snowflake DB
-no charge for snowpipe just pay for compute

*snowpipe loads new data typically within a minute notification is sent
*Its a severless feature
*Snowpipe load is stored in metadata of the pipe for 14 days used to prevent loading the same files in a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are sequences

A

used to generate unique numbers across sessions. They can be used to generate unique values for PK or any column that requires a unique value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Drivers snowflake supports

A

Go driver
jdbc
.net driver
node .js
ODBC driver
PHP PDO driver for snowflake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is snowpark?

A

API to interact with data using:
-java
-scala
-python

no need for developers to move data out of snowflake to use language of their choice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the object container hierarchy?

A

Organisation> account> DB> schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

search optimization speeds range or equality searches?

A

speeds only equality searches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is a stored procedure?

A

Like functions a SP is created once and can be executed many times. A SP is created with a create procedure command and is executed using the call command.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a materialized view

A

A materialized view views results are stored, almost as though the results were a table. This allows faster access but requires storage and maintenance both which incur additional costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are the two scaling policies

A

standard vs economy (conserves credits/warehouse starts if the system estimates theres enough load to keep cluster busy for at least 6 minutes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is a micro-partition size

A

50-500 mb (before compression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a BUILD_SCOPED_FILE_URL

A

to create a URL file thats valid for only 24 hours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the maximum number of days the property DATA_RETENTION_TIME_IN_DAYS can have?

A

90 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what does the put command do

A

transfer files from a local file system to an internal stage

25
Q

Scalar vs tabular (UDF)

A

scalar single row or value
tabular multiple rows or multiple sets of rows.

26
Q

what is a view

A

view allows the result of a query to be accessed as if it were a table. c

27
Q

can standard views & MV be secure views?

A

no only standard views can be secured views

28
Q

how often does a key rotation happen?

A

30 days

29
Q

snowflake supports using SSO to connect & authenticate with the following clients.

A

python connector
jdbc driver
snowsql
ODBC driver

30
Q

what does copy into do?

A

Unloads data from a table (or query) into one or more files in one of the following locations:
Named internal stage (or table/user stage). The files can then be downloaded from the stage/location using the GET command.
Named external stage that references an external location (Amazon S3, Google Cloud Storage, or Microsoft Azure).
External location (Amazon S3, Google Cloud Storage, or Microsoft Azure).

31
Q

what does the get command do?

A

Downloads data files from one of the following Snowflake stages to a local directory/folder on a client machine:

Named internal stage.
Internal stage for a specified table.
Internal stage for the current user.
Typically, this command is executed after using the COPY INTO <location> command to unload data from a table into a Snowflake stage.</location>

32
Q

What are the data types

A

*Numeric
*String binary
*logical
*date & time
*semi structured
*geospatial

33
Q

what languages can you write SPs in?

A

javascript
snowflake scripting
scala
java
python

34
Q

failsafe is a nonconfig period of how many days?

A

7 days

35
Q

DATA_RETENTION_TIME_IN_DAYS can be set at table and DB level?

A

yes

36
Q

what is zero copy cloning

A

clone data in seconds as many times as you want
only pay for the unique data you store

37
Q

What is a UDF?

A

schema level objects that enable users to write their own func in diff languages
sql
javascript
python
java

38
Q

what are the different billing methods?

A

virtual warehouse
cloud service
serverless service
storage
data transfer

39
Q

What can you do with time travel

A

Using Time Travel, you can perform the following actions within a defined period of time:

*Query data in the past that has since been updated or deleted.

*Create clones of entire tables, schemas, and databases at or before specific points in the past.

*Restore tables, schemas, and databases that have been dropped.

Once the defined period of time has elapsed, the data is moved into Snowflake Fail-safe and these actions can no longer be performed.

Time Travel:
standard (1 day)
Enterprise edition (90 days)

FAILSAFE:
transient (0 days)
Permanent (7 days)

40
Q

what are the snowflake editions

A

standard (1 day time travel)
enterprise (90 days time travel)
business critical
VPS

41
Q

what is overwrite in an insert statement?

A

it will truncate a table before values are inserted

42
Q

What is a stream

A

it tracks the changes to the table

43
Q

What can you do with time travel

A

-query data in the past that has since been updated or deleted.
-create clones of entire tables, schemas, and databases at or before specific points in the past.
-Restore tables, schemas, & DBs that have been dropped

44
Q

A stale view is when the following occurs

A

The view references tables or databases that do not exist.

A schema or metadata change is made in a referenced table.

A referenced table is dropped and recreated with a different schema or configuration.

45
Q

A task can execute the following

A

single sql statement
call to SP
procedural logic using dev guide

46
Q

how many days for each view:
Account usage (# year)
query history (# weeks)
Information schema (# days)

A

Account usage (1 year)
query history (2 weeks)
Information schema (7 days)

47
Q

What are the different table types

A

Permanent
Time Travel: 90 days
Fail safe: yes

Temporary
Used for transitory data
Persistent for duration of session
Fail safe: 1 day

Transient:
Exists until explicitly dropped
Fail safe: 1 day

External:
query data outside

ready only
Timetravel: no
Failsafe: no

48
Q

A share is read only

A

Correct you cannot alter it

49
Q

Timeout for UDFs

A

The default timeout limit for JS UDFs is 30 seconds for a row set! Snowflake will send rows in sets to the JS engine, and it will try to process all rows in the set within that time. The size of the row set may vary, but you may assume it will be around 1K (this is just an estimation, the number of rows in a set could be much higher or lower).

The timeout limit is different for Java and Python UDFs. It’s 300 seconds for them.

50
Q

Do you need support to increase LOCK_TIMEOUT

A

no you dont you can set it to any number

51
Q

What does copy into command do?

A

Loads data from staged files to an existing table. The files must already be staged in one of the following locations:

Named internal stage (or table/user stage). Files can be staged using the PUT command.

Named external stage that references an external location (Amazon S3, Google Cloud Storage, or Microsoft Azure).

You cannot access data held in archival cloud storage classes that requires restoration before it can be retrieved. These archival storage classes include, for example, the Amazon S3 Glacier Flexible Retrieval or Glacier Deep Archive storage class, or Microsoft Azure Archive Storage.

External location (Amazon S3, Google Cloud Storage, or Microsoft Azure).

52
Q

What is the maximum size for each file

A

By default, COPY INTO location statements separate table data into a set of output files to take advantage of parallel operations. The maximum size for each file is set using the MAX_FILE_SIZE copy option. The default value is 16777216 (16 MB) but can be increased to accommodate larger files. The maximum file size supported is 5 GB for Amazon S3, Google Cloud Storage, or Microsoft Azure stages.

53
Q

Time travel for transient is how many days?

A

For transient databases, schemas, and tables, the retention period can be set to 0 (or unset back to the default of 1 day). The same is also true for temporary tables.

54
Q

How are you billed for MCW

A

with per-second billing, the actual credit usage would contain fractional amounts, based on the number of seconds that each cluster runs.

55
Q

How do you check table stats?

A

TABLE_STORAGE_METRICS view

56
Q

File formats

A

Avro–
First released in 2009, Avro was developed within Apache’s Hadoop architecture. It uses JSON data for defining data types and schemas.

Benefits of using Avro:
Data definitions are stored within JSON, allowing data to be easily read and interpreted.

Avro is 100% schema-dependent with data and schema stored together in the same file or message, allowing data to be sent to any destination or processed by any program.

Avro supports data schemas as they change over time, accommodating changes like missing, added, and changed fields.

Avro does not require a coding generator. Data stored in Arvo is shareable between programs even when they’re not using the same language.

Where Avro has the edge:
Avro offers more highly developed options for schema evolution.

Avro is more efficient for use with write-intensive, big data operations.

Row-based storage makes Avro the better choice when all fields need to be accessed.

Language-independent format is ideal when data is being shared across multiple apps using different languages.

Parquet–
Originally developed by Cloudera in partnership with Twitter, Parquet is highly integrated with Apache Spark, serving as the default file format for this popular data processing framework.

Benefits of Parquet:
Parquet supports complex nested data structures in a flat columnar format.

Parquet ccommodates all big data formats including structured data, semi-structured, and unstructured data.

Because it uses data skipping to locate specific column values without reading all of the data in the row, Parquet enables high rates of data throughput.

Where Parquet has the edge:
Parquet offers numerous data storage optimizations.

Parquet is more efficient at data reads and analytical querying.

Parquet is an good choice for storing nested data.

Parquet compresses data more efficiently.

If using Apache Spark, Parquet offers a seamless experience.

57
Q

What is semistructured data?

A

Data can come from many sources, including applications, sensors, and mobile devices. To support these diverse data sources, semi-structured data formats have become popular standards for transporting and storing data:

Snowflake provides built-in support for importing data from (and exporting data to) the following semi-structured data formats:

JSON
Avro
ORC
Parquet
XML

Snowflake provides native data types (ARRAY, OBJECT, and VARIANT) for storing semi-structured data.

Snowflake also provides native support for querying semi-structured data.

58
Q

What is unstructured data?

A

Unstructured data is information that does not fit into a predefined data model or schema. Typically text-heavy, such as form responses and social media conversations, unstructured data also encompasses images, video, and audio. Industry-specific file types such as VCF (genomics), KDF (semiconductors), or HDF5 (aeronautics) are included in this category.

Snowflake supports the following actions:

Securely access data files located in cloud storage.

Share file access URLs with collaborators and partners.

Load file access URLs and other file metadata into Snowflake tables.

Process unstructured data.

59
Q

What are common semi structured formats

A

JSON, XML, Avro, Parquet