1. primarily used to work with data in a data lake 2. pay-per-query endpoint to query the data in your data lake

1. if you plan to query data in the same location frequently, it's more efficient to define an external data source that references that location 2. benefit of an external data source, is that you can simplify an OPENROWSET query to use the combination of the data source and the relative path to the folders or files you want to query 3. ou can assign a credential for the data source to use when accessing the underlying storage, enabling you to provide access to data through SQL without permitting users to access the data directly in the storage account.

1. an existing table 2. view in a database 3. OPENROWSET function that reads file-based data from the data lake

Build data analytics solutions using Azure Synapse serverless SQL pools Flashcards by Анастасия Сычева

Azure Synapse Analytics

includes serverless SQL pools, which are tailored for querying data in a data lake
can use SQL code to query data in files of various common formats without needing to load the file data into database storage.

How well did you know this?

Not at all

Perfectly

Azure Synapse SQL

a distributed query system in Azure Synapse Analytics

How well did you know this?

Not at all

Perfectly

Azure Synapse SQL runtime environments

Serverless SQL pool
Dedicated SQL pool

How well did you know this?

Not at all

Perfectly

Dedicated SQL pool

Enterprise-scale relational database instances used to host data warehouses in which data is stored in relational tables

How well did you know this?

Not at all

Perfectly

Serverless SQL pool

primarily used to work with data in a data lake
pay-per-query endpoint to query the data in your data lake

How well did you know this?

Not at all

Perfectly

Benefits of using an SQL pool

A familiar Transact-SQL syntax to query data in place (no load)
Integrated connectivity from a wide range of business intelligence and ad-hoc querying tools
Distributed query processing that is built for large-scale data
Built-in query execution fault-tolerance - high success rate for long queries
No infrastructure to setup or clusters to maintain.
No charge for resources reserved, only for queries

How well did you know this?

Not at all

Perfectly

When to use serverless SQL pools

tailored for querying the data in the data lake
great for unplanned or “bursty” workloads
Workloads that require millisecond response times and are looking to pinpoint a single row in a data set are not good fit for serverless SQL pool.

How well did you know this?

Not at all

Perfectly

Common use cases for serverless SQL pools include:

Data exploration
Data transformation
Logical data warehouse.

How well did you know this?

Not at all

Perfectly

Logical data warehouse

can define external objects such as tables and views in a serverless SQL database.
emains stored in the data lake files, but are abstracted by a relational schema that can be used by client applications and analytical tools to query the data as they would in a relational database hosted in SQL Server

How well did you know this?

Not at all

Perfectly

File formats that can be queries

Delimited text, such as comma-separated values (CSV) files.
JavaScript object notation (JSON) files.
Parquet files.

How well did you know this?

Not at all

Perfectly

External data source

if you plan to query data in the same location frequently, it’s more efficient to define an external data source that references that location
benefit of an external data source, is that you can simplify an OPENROWSET query to use the combination of the data source and the relative path to the folders or files you want to query
ou can assign a credential for the data source to use when accessing the underlying storage, enabling you to provide access to data through SQL without permitting users to access the data directly in the storage account.

How well did you know this?

Not at all

Perfectly

External file format

encapsulate settings for delimited text files

CREATE EXTERNAL FILE FORMAT CsvFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ‘,’,
STRING_DELIMITER = ‘”’
)
);
GO

How well did you know this?

Not at all

Perfectly

External table

using the OPENROWSET function can result in complex code that includes data sources and file paths. To simplify access to the data, you can encapsulate the files in an external table

How well did you know this?

Not at all

Perfectly

Some delimiter file settings

With and without a header row.
Comma and tab-delimited values.
Windows and Unix style line endings.
Non-quoted and quoted values, and escaping characters.

How well did you know this?

Not at all

Perfectly

Persist the results of a query in an external table

CREATE EXTERNAL TABLE AS SELECT (CETAS)
an external table, stores its data in a file in the data lake.

How well did you know this?

Not at all

Perfectly

CETAS data sources

an existing table
view in a database
OPENROWSET function that reads file-based data from the data lake

How well did you know this?

Not at all

Perfectly

Types of objects to be created to use with CETAS

Study These Flashcards

external data source
external data format

External data source

Study These Flashcards

encapsulates a connection to a file system location in a data lake
use this connection to specify a relative path in which the data files for the external table crea

LOCATION and BULK parameters

Study These Flashcards

relative paths for the results and source files respectively

NB relative to the file system location referenced by the files external data source.

External table

Study These Flashcards

external tables are a metadata abstraction over the files that contain the actual data. Dropping an external table does not delete the underlying files.

Benefits of stored procedures

Study These Flashcards

Reduces client to server network traffic (commands are executed in a single batch)
Provides a security boundary
Eases maintenance
Improved performance (execution plan is held in the cache and reused on subsequent runs)

a pipeline for the data transformation enables you to

Study These Flashcards

schedule the operation to run

at specific times
based on specific events

A lake database

Study These Flashcards

provides a relational metadata layer over one or more files in a data lake.
can create a lake database that includes definitions for tables, including column names and data types as well as relationships between primary and foreign key columns.
, the storage of the data files is decoupled from the database schema; enabling more flexibility than a relational database system typically offers.

Lake database storage

Study These Flashcards

stored in the data lake as Parquet or CSV files
can be managed independently of the database tables, making it easier to manage data ingestion and manipulation with a wide variety of data processing tools and technologies

Lake database compute

1. Azure Synapse serverless SQL pool 2. Azure Synapse Apache Spark (Spark SQL API)

Azure Synapse database designer

1. can define the schema for your database 2. Specifying the name and storage settings for each table. 3. Specifying the names, key usage, nullability, and data types for each column. 4. Defining relationships between key columns in tables.

Serverless SQL pool authentication

how users prove their identity when connecting to the endpoint

Two SQL Pool authentications

1. SQL Authentication (username and password) 2. Microsoft Entra authentication (managed by Microsoft Entra ID)

Authorization

1. is controlled by your user account's database role memberships and object-level permissions.

SQL Authentication

SQL user exists only in the serverless SQL pool and permissions are scoped to the objects in the serverless SQL pool. Access to securable objects in other services (such as Azure Storage) can't be granted to a SQL user directly since it only exists in scope of serverless SQL pool. The SQL user needs get authorization to access the files in the storage account.

Microsoft Entra authentication

a user can sign in to a serverless SQL pool and other services, like Azure Storage, and can grant permissions to the Microsoft Entra user

Serverless SQL pool supports the following authorization types (for Azure storage)

1. Anonymous access (publicly available files) 2. Shared access signature (SAS) 3. Managed Identity. (?) 4. User Identity (?)

Shared access signature (SAS)

1. delegated access to resources in storage account 2. can grant clients access to resources in storage account, without sharing account keys 3. gives you granular control over the type of access you grant to clients who have the SAS: validity interval, granted permissions, acceptable IP address range, acceptable protocol

Azure Storage Access Control Types

1. Azure role-based access control (Azure RBAC) 2. Access control lists (ACLs)

ACL

1. Each file and directory in your storage account has an access control list 2. ACL check determines whether that security principal has the correct permission level to perform the operation.

Kinds of access control lists:

1. Access ACLs: access to an object 2. Default ACLs: emplates of ACLs associated with a directory that determine the access ACLs for any child items that are created under that directory. Files do not have default ACLs.

Permission types

1. Read 2. Write 3. Execute

Guidelines in setting up ACLs

1. Always use Microsoft Entra security groups as the assigned principal in an ACL entry 2. Resist the opportunity to directly assign individual users or service principals

Roles: For users which need read only acces

Storage Blob Data Reader

Build data analytics solutions using Azure Synapse serverless SQL pools Flashcards

(40 cards)