Data Loading & Unloading Flashcards

Question

Copy Option: TRUNCATECOLUMNS

Answer 1

Boolean that specifies whether to truncate text strings that exceed the target column length Default Value - FALSE

Answer 2

Boolean that specifies to load all files, regardless of whether they've been loaded previously and have not changed since they were loaded Default Value - FALSE

Answer 3

Boolean that specifies to load files for which the load status is unknown. The COPY command skips these files by default. Default Value - FALSE

Answer 4

- FILE - STATUS - ROWS_PARSED - ROWS_LOADED - ERROR_LIMIT - ERRORS_SEEN - FIRST_ERROR - FIRST_ERROR_LINE - FIRST_ERROR_CHARACTER - FIRST_ERROR_COLUMN_NAME

Answer 5

Data Type - TEXT Description - Name of source file and relative path to the file

Answer 6

Data Type - TEXT Description - Status: loaded, load failed, or partially loaded

Answer 7

Data Type - NUMBER Description - Number of rows parsed from the source file

Answer 8

Data Type - NUMBER Description - Number of rows loaded from the source file

Answer 9

Data Type - NUMBER Description - If the number of errors reaches this limit, then abort

Answer 10

Data Type - NUMBER Description - Number of error rows in the source file

Answer 11

Data Type - TEXT Description - First error of the source file

Answer 12

Data Type - NUMBER Description - Line number of the first error

Answer 13

Data Type - NUMBER Description - Position of the first error character

Answer 14

Data Type - TEXT Description - Column name of the first error

Answer 15

Optional parameter allows you to perform a dry-run of load process to expose errors when running COPY INTO

Answer 16

- Validate is a table function to view all errors encountered during a previous COPY INTO execution - Validate accepts a job id of a previous query or the last load operation executed

Answer 17

File Format Snowflake

Answer 18

CSV, JSON, AVRO, ORC, PARQUET, XML

Answer 19

Comma-Separated Values file A plain text file that contains a list of data. They mostly use the comma character to separate data, but sometimes use other characters, like semicolons.

Answer 20

JavaScript Object Notation file A file that stores simple data structures and objects in JavaScript Object Notation (JSON) format. It is primarily used for transmitting data between a web application and a server. They are lightweight, text-based, human-readable, and can be edited using a text editor

Answer 21

Stores the data definition in JSON format making it easy to read and interpret; the data itself is stored in binary format making it compact and efficient. Avro files include markers that can be used to split large data sets into subsets suitable for Apache MapReduce processing.

Answer 22

Optimized Row Columnar (ORC) Open-source columnar storage file format originally released in early 203 for Hadoop workloads. ORC provides a highly-efficient way to store Apache Hive data, though it can store other data as well. It was designed and optimized specifically with Hive data in mind, improving the overall performance when HIve reads, writes, and process data.

Answer 23

Apache Parquet is a file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. It's open-source and supports multiple coding languages, including Java, C++, and Python.

Answer 24

Extensible Markup Language file It contains a formatted dataset that is intended to be processed by a website, web application, or software program. XML files can be thought of as text-based databases

Answer 25

CSV, UTF-8

Answer 26

The Pipe object defines a COPY INTO

Answer 27

1. Automating Snowpipe using cloud messaging (external stages only) 2. Call Snow REST endpoints (internal and external stages)

Answer 28

serverless feature, Virtual Warehouse

Answer 29

metadata, 14

Answer 30

Bulk Loading: Relies on the security options supported by the client for authenticating and initiating a user session. Snowpipe: When calling the REST endpoints: Requires key pair authentication with JSON Web Token (JWT). JWTs are signed using a public/private key pair with RSA encryption.

Answer 31

Bulk Loading: Stored in the metadata of the target table for 64 days. Snowpipe: Stored in the metadata of the pipe for 14 days.

Answer 32

Bulk Loading: Requires a user-specified warehouse to execute COPY statements. Snowpipe: Uses Snowflake-supplied compute resources

Answer 33

Bulk Loading: Billed for the amount of time each virtual warehouse is active. Snowpipe: Snowflake tracks the resource consumption of loads for all pipes in an account, with per-second/per-core granularity, as Snowpipe actively queues and process data files. In addition to resource consumption, an overhead is included in the utilization costs charged for Snowpipe: 0.06 credits per 1000 files notified or listed via event notifications or REST API calls.

Answer 34

- Break files into 100-250 MB compressed - Organize Data by Path - Separate virtual warehouses for Load and Query - Pre-sort data - Load files no more than 1 file per minute so they don't back up in queue and incur cost

Answer 35

PARTITION BY

Answer 36

Definition: Boolean that specifies whether the COPY command overwrites existing files with matching names, if any, in the location where files are stored. Default Value: 'ABORT_STATEMENT'

Answer 37

Definition: Boolean that specifies whether to generate a single file or multiple files. Default Value: FALSE

Answer 38

Definition: Number (>0) that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Default Value: FALSE

Answer 39

Definition: Boolean that specifies whether to uniquely identify unloaded files by including a universally unique identifier (UUID) in the filenames of unloaded data files. Default Value: FALSE

Answer 40

GET, target

Answer 41

parallel, parellelization

Answer 42

Contains 0 or more elements of data. Each element is accessed by its position in the array.

Answer 43

Represent collections of key-value pairs.

Answer 44

Universal semi-structured data type used to represent arbitrary data structures.

Answer 45

JSON, AVRO, ORC, PARQUET, XML

Answer 46

Semi-Structured Data file ---PUT--> Stage --- COPY INTO --> Table

Answer 47

Used only for loading JSON data into separate columns. Defines the format of date string values in the data files.

Answer 48

Used only for loading JSON data into separate columns. Defines the format on time string values in the data files.

Answer 49

Supported algorithms: GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONE. If BROTLI, cannot use AUTO.

Answer 50

Only used for loading. If TRUE, allows duplicate object field names (only the last one will be preserved)

Answer 51

Only used for loading. If TRUE, JSON parser will remove outer brackets []

Answer 52

Only used for loading. If TRUE, JSON parser will remove object fields or array elements containing NULL

Answer 53

1. ELT (Extract, Load, Transform) 2. ETL (Extract, Transform, Load) 3. Automatic Schema Detection (INFER_SCHEMA, MATCH_BY_COLUMN_NAME)

Answer 54

Table ---COPY INTO--> Stage --GET--> Semi-structured Data Files

Answer 55

SELECT :. FROM

Answer 56

SELECT [''] FROM

Answer 57

SELECT SRC: [Element Index] FROM

Data Loading & Unloading Flashcards

(98 cards)