Build data analytics solutions using Azure Synapse serverless SQL pools Flashcards
(40 cards)
Azure Synapse Analytics
- includes serverless SQL pools, which are tailored for querying data in a data lake
- can use SQL code to query data in files of various common formats without needing to load the file data into database storage.
Azure Synapse SQL
a distributed query system in Azure Synapse Analytics
Azure Synapse SQL runtime environments
- Serverless SQL pool
- Dedicated SQL pool
Dedicated SQL pool
Enterprise-scale relational database instances used to host data warehouses in which data is stored in relational tables
Serverless SQL pool
- primarily used to work with data in a data lake
- pay-per-query endpoint to query the data in your data lake
Benefits of using an SQL pool
- A familiar Transact-SQL syntax to query data in place (no load)
- Integrated connectivity from a wide range of business intelligence and ad-hoc querying tools
- Distributed query processing that is built for large-scale data
- Built-in query execution fault-tolerance - high success rate for long queries
- No infrastructure to setup or clusters to maintain.
- No charge for resources reserved, only for queries
When to use serverless SQL pools
- tailored for querying the data in the data lake
- great for unplanned or “bursty” workloads
- Workloads that require millisecond response times and are looking to pinpoint a single row in a data set are not good fit for serverless SQL pool.
Common use cases for serverless SQL pools include:
- Data exploration
- Data transformation
- Logical data warehouse.
Logical data warehouse
- can define external objects such as tables and views in a serverless SQL database.
- emains stored in the data lake files, but are abstracted by a relational schema that can be used by client applications and analytical tools to query the data as they would in a relational database hosted in SQL Server
File formats that can be queries
- Delimited text, such as comma-separated values (CSV) files.
- JavaScript object notation (JSON) files.
- Parquet files.
External data source
- if you plan to query data in the same location frequently, it’s more efficient to define an external data source that references that location
- benefit of an external data source, is that you can simplify an OPENROWSET query to use the combination of the data source and the relative path to the folders or files you want to query
- ou can assign a credential for the data source to use when accessing the underlying storage, enabling you to provide access to data through SQL without permitting users to access the data directly in the storage account.
External file format
encapsulate settings for delimited text files
CREATE EXTERNAL FILE FORMAT CsvFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ‘,’,
STRING_DELIMITER = ‘”’
)
);
GO
External table
using the OPENROWSET function can result in complex code that includes data sources and file paths. To simplify access to the data, you can encapsulate the files in an external table
Some delimiter file settings
- With and without a header row.
- Comma and tab-delimited values.
- Windows and Unix style line endings.
- Non-quoted and quoted values, and escaping characters.
Persist the results of a query in an external table
- CREATE EXTERNAL TABLE AS SELECT (CETAS)
- an external table, stores its data in a file in the data lake.
CETAS data sources
- an existing table
- view in a database
- OPENROWSET function that reads file-based data from the data lake
Types of objects to be created to use with CETAS
- external data source
- external data format
External data source
- encapsulates a connection to a file system location in a data lake
- use this connection to specify a relative path in which the data files for the external table crea
LOCATION and BULK parameters
relative paths for the results and source files respectively
NB relative to the file system location referenced by the files external data source.
External table
external tables are a metadata abstraction over the files that contain the actual data. Dropping an external table does not delete the underlying files.
Benefits of stored procedures
- Reduces client to server network traffic (commands are executed in a single batch)
- Provides a security boundary
- Eases maintenance
- Improved performance (execution plan is held in the cache and reused on subsequent runs)
a pipeline for the data transformation enables you to
schedule the operation to run
- at specific times
- based on specific events
A lake database
- provides a relational metadata layer over one or more files in a data lake.
- can create a lake database that includes definitions for tables, including column names and data types as well as relationships between primary and foreign key columns.
- , the storage of the data files is decoupled from the database schema; enabling more flexibility than a relational database system typically offers.
Lake database storage
- stored in the data lake as Parquet or CSV files
- can be managed independently of the database tables, making it easier to manage data ingestion and manipulation with a wide variety of data processing tools and technologies