Azure Data Terms 01 Flashcards

1
Q

ACID

A

Atomicity, consistency, isolation, durability. A set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

AES

A

Advanced Encryption Standard. The encryption standard used for Transparent data encryption (TDE) in Microsoft SQL Server (all flavors).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BSON

A

Binary JSON. A binary-encoded serialization of JSON-like documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DMS

A

Azure Database Migration Service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DPR

A

Supplier data protection requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DQS

A

Data Quality Services. The data quality tool for Microsoft SQL Server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DTU

A

Database transaction unit. On Microsoft Azure, a way of measuring computing resources for pricing, an alternative to serverless compute and provisioned compute pricing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DWU

A

Data warehouse unit. In Azure Synapse Analytics, a measure based on CPU, memory, and I/O values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

EDA

A

Exploratory data analysis. An approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. One of the best practices in MLOps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

GRS

A

Geo-redundant storage. A redundancy option in Azure Storage in which there are three copies of the data in the primary region using locally-redundant storage (LRS) and another three copies in a second region also using LRS. The other options are LRS, zone-redundant storage (ZRS), and geo-zone-redundant storage (GZRS) which is the same as GRS except that the data in the primary region uses ZRS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ISV

A

Independent software vendor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

LRS

A

Locally-redundant storage. A redundancy option in Azure Storage in which there are three copies of the data within the same datacenter. The other options are zone-redundant storage (ZRS), geo-redundant storage (GRS), and geo-zone-redundant storage (GZRS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

MDS

A

Master Data Services. The MDM tool for Microsoft SQL Server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MLOps

A

Machine Learning Operations. The Machine Learning analog of DevOps, as used in Azure Machine Learning (Azure ML).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

MPP

A

Massively parallel processing. The coordinated processing of a program by multiple processors working on different parts of the program. Each processor has its own operating system and memory. MPP speeds the performance of huge databases that deal with massive amounts of data. Contrast with Symmetric multiprocessing (SMP).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MQL

A

MongoDB Query Language. The query langugage used for the MongoDB API in Azure Cosmos DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

MVCC

A

Multi-version concurrency control. A feature of PostgreSQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

ORC

A

Optimized Row Columnar. A highly efficient file format used for Apache Hive data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

RDP

A

Remote Desktop Protocol.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

RU

A

Request unit. The unit of measure for pricing in Azure Cosmos DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

SID

A

Security identifier. A unique value that is used to identify any security entity that the Windows operating system can authenticate. A security entity can be a security principal - a user account, a computer account or a process started by those accounts - or it can be a security group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

SLO

A

Service Level Objective. A standardized combination of measures like CPU, memory, and I/O used for pricing tiers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

SMP

A

Symmetric multiprocessing. A multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally. Also called “shared-memory multiprocessing”. Contrast with massively parallel processing (MPP).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

SQL-MI

A

Azure SQL Managed Instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

SSPA

A

Supplier Security and Privacy Assurance. Microsoft’s program for ensuring that vendors adhere to data security and privacy standards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

TDE

A

Transparent data encryption. The feature in Microsoft SQL Server (all flavors) that encrypts data at rest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

TDS

A

Tabular Data Stream. An application layer protocol used to transfer data between a database server and a client.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

TDSP

A

Team Data Science Process. An agile, iterative data science process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Adjacency list

A

In graph theory and computer science, a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Adjacency matrix

A

In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Advanced Encryption Standard (AES)

A

The encryption standard used for Transparent data encryption (TDE) in Microsoft SQL Server (all flavors).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Apache Cassandra API

A

The API for column-family storage in Azure Cosmos DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Apache Gremlin API

A

The API for graph databases in Azure Cosmos DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Apache Hive

A

A distributed, fault-tolerant data warehouse system that enables analytics at a massive scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Atomicity

A

In ACID, the guarantee that each transaction is treated as a single unit that either succeeds completely or fails completely, even if it’s made up of multiple statements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Atomicity, consistency, isolation, durability (ACID)

A

A set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Batch layer

A

In the Lambda architecture, the layer that precomputes results using a distributed processing system that can handle very large quantities of data. The batch layer aims at perfect accuracy by being able to process all available data when generating views.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Binary JSON (BSON)

A

Binary JSON. A binary-encoded serialization of JSON-like documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Blocking transformation

A

In SQL Server Integration Services (SSIS), a data flow task that stalls the process, such as a large sort operation where there is not enough memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Card

A

In Microsoft Power BI, a report visual that focuses on displaying a single value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Columnstore

A

Data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format. Contrast with rowstore.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Columnstore index

A

An index used for storing and querying large data warehouse fact and dimension tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Compute node

A

In Azure Synapse Analytics, a node that works on a chunk of data. In Massively Parallel Processing (MPP), this is called a worker node.

44
Q

Consistency

A

In ACID, the guarantee that a transaction can only bring the database from one consistent state to another, preserving database invariants like primary and foreign keys. Consistency also guarantees that after a successful write of a record, any read request immediately receives the latest value of the record.

45
Q

Consumer group

A

In data stream processing, an isolated view over a data stream that can be read at its own pace and starting at its own offset.

46
Q

Control flow

A

In SQL Server Integration Services (SSIS), the ordered set of tasks to be performed. Contrast with a data flow, which is a specific task in the control flow that transforms data.

47
Q

Control node

A

In Azure Synapse Analytics, the node that orchestrates multiple compute nodes. In Massively Parallel Processing (MPP), this is called a header node.

48
Q

Data flow

A

In SQL Server Integration Services (SSIS), a specific task in the control flow that transforms data.

49
Q

Data mart

A

A subset of a data warehouse focused on a particular line of business, department, or subject area. This may appear on the serving layer.

50
Q

Data Quality Services (DQS)

A

The data quality tool for Microsoft SQL Server.

51
Q

Data virtualization

A

A unified, virtual data access layer built on top of many data sources, allowing the data to be accessed without moving it to a central repository. One example is PolyBase for SQL Server.

52
Q

Data warehouse unit (DWU)

A

In Azure Synapse Analytics, a measure based on CPU, memory, and I/O values.

53
Q

Database transaction unit (DTU)

A

On Microsoft Azure, a way of measuring computing resources for pricing, an alternative to serverless compute and provisioned compute pricing.

54
Q

Dedicated SQL pool

A

In Azure Synapse Analytics, a collection of analytic resources that are provisioned when using Synapse SQL.

55
Q

Delta Lake

A

An open-source storage layer that adds support to Data Lake Storage for transactional consistency.

56
Q

DirectQuery

A

For Microsoft Power BI and Microsoft Analysis Services, a method of connecting directly to the data source without importing data.

57
Q

Durability

A

In ACID, the guarantee that once a transaction has been committed, it will remain committed even in the case of a system failure.

58
Q

Elastic pool

A

A model for Azure SQL Database in which one has several databases sharing resources in the same pool.

59
Q

Event

A

In the context of Streaming Data, a chunk of raw information. Synonym for message.

60
Q

Event hub capture

A

A feature of Azure Event Hubs that offloads incoming events to a cloud storage location as soon as they arrive, as well as passing them down the stream pipeline.

61
Q

Exploratory data analysis (EDA)

A

An approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. One of the best practices in MLOps.

62
Q

Functional partitioning

A

A strategy in which data is aggregated according to how it is used by each bounded context in the system.

63
Q

Geo-redundant storage (GRS)

A

A redundancy option in Azure Storage in which there are three copies of the data in the primary region using locally-redundant storage (LRS) and another three copies in a second region also using LRS. The other options are LRS, zone-redundant storage (ZRS), and geo-zone-redundant storage (GZRS) which is the same as GRS except that the data in the primary region uses ZRS.

64
Q

Header node

A

In Massively Parallel Processing (MPP), the node that orchestrates multiple worker nodes. In Azure Synapse Analytics, this is called a control node.

65
Q

Horizontal partitioning

A

Also called sharding, a strategy in which each partition or shard holds a subset of the rows.

66
Q

Hypervisor

A

Computer software, firmware or hardware that creates and runs virtual machines. Similar to an emulator. Also called a virtual machine monitor (VMM) or virtualizer.

67
Q

Information retrieval

A

The process of using Machine Learning models to extract useful information from unstructured data.

68
Q

Isolation

A

In ACID, the guarantee that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.

69
Q

Kappa architecture

A

Kappa architecture is a data processing architecture that is designed to provide a scalable, fault-tolerant, and flexible system for processing large amounts of data in real time. It was developed as an alternative to Lambda architecture.

70
Q

Lambda architecture

A

A data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. In many cases, it’s being replaced with kappa architecture.

71
Q

Locally-redundant storage (LRS)

A

A redundancy option in Azure Storage in which there are three copies of the data within the same datacenter. The other options are zone-redundant storage (ZRS), geo-redundant storage (GRS), and geo-zone-redundant storage (GZRS).

72
Q

Machine Learning Operations (MLOps)

A

The Machine Learning analog of DevOps, as used in Azure Machine Learning (Azure ML).

73
Q

Massively parallel processing (MPP)

A

The coordinated processing of a program by multiple processors working on different parts of the program. Each processor has its own operating system and memory. MPP speeds the performance of huge databases that deal with massive amounts of data. Contrast with Symmetric multiprocessing (SMP).

74
Q

Master Data Services (MDS)

A

The MDM tool for Microsoft SQL Server.

75
Q

Message

A

In the context of Streaming Data, a chunk of raw information. Synonym for event.

76
Q

MLflow

A

An open source platform for the machine learning lifecycle. Site.

77
Q

MongoDB Query Language (MQL)

A

The query langugage used for the MongoDB API in Azure Cosmos DB.

78
Q

Multi-version concurrency control (MVCC)

A

A feature of PostgreSQL.

79
Q

Optimized Row Columnar (ORC)

A

A highly efficient file format used for Apache Hive data.

80
Q

Partition pruning

A

A data querying optimization in which partitions are skipped if they are known not to have matching values.

81
Q

PL-PGSQL

A

The query language for PostgreSQL.

82
Q

PolyBase

A

A data virtualization feature for SQL Server.

83
Q

Predicate pushdown

A

A data querying optimization in which rows are filtered at the source to reduce the amount of data sent to the requestor. Projection pushdown is the equivalent for only sending the needed columns.

84
Q

Producer

A

In the context of streaming data, something that produces events/messages, such as a mobile device, production line, or application.

85
Q

Projection pushdown

A

A data querying optimization in which only the needed columns are sent to the requestor. Predicate pushdown is the equivalent for only sending the needed rows.

86
Q

Request unit (RU)

A

The unit of measure for pricing in Azure Cosmos DB.

87
Q

Rowgroup

A

A group of rows that are compressed into columnstore format at the same time. Usually this is the maximum number of rows in a rowgroup, 1,048,576.

88
Q

Rowstore

A

Data that’s logically organized as a table with rows and columns, and physically stored in a row-wise data format as is usually the case in a relational database. Contrast with columnstore.

89
Q

Schema on read

A

A system in which a schema is created only when reading the data, as in NoSQL. Contrast with schema on write.

90
Q

Schema on write

A

A system in which a schema is created before writing into a database, as in SQL. Contrast with schema on read.

91
Q

Security identifier (SID)

A

A unique value that is used to identify any security entity that the Windows operating system can authenticate. A security entity can be a security principal - a user account, a computer account or a process started by those accounts - or it can be a security group.

92
Q

Semantic model

A

In OLAP, a conceptual model that describes the meaning of the data elements it contains. Not to be confused with semantic modeling, which is a process in text analytics.

93
Q

Service Level Objective (SLO)

A

A standardized combination of measures like CPU, memory, and I/O used for pricing tiers.

94
Q

Serving layer

A

In the Lambda architecture, the layer that stores output from the batch and speed layers and responds to ad-hoc queries by returning precomputed views or building views from the processed data.

95
Q

Sharding

A

Also called horizontal partitioning, a strategy in which each partition or shard holds a subset of the rows.

96
Q

Speed layer

A

In the Lambda architecture, the layer that processes data streams in real time and without the requirements of fix-ups or completeness. This layer sacrifices throughput as it aims to minimize latency by providing real-time views into the most recent data.

97
Q

Supplier Security and Privacy Assurance (SSPA)

A

Microsoft’s program for ensuring that vendors adhere to data security and privacy standards.

98
Q

Symmetric multiprocessing (SMP)

A

A multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally. Also called “shared-memory multiprocessing”. Contrast with massively parallel processing (MPP).

99
Q

Tabular Data Stream (TDS)

A

An application layer protocol used to transfer data between a database server and a client.

100
Q

Tabular model

A

Microsoft Analysis Services, a database that runs in-memory or in DirectQuery mode, connecting to data from back-end relational data sources.

101
Q

Team Data Science Process (TDSP)

A

An agile, iterative data science process.

102
Q

Time window aggregation

A

In data stream processing, a method for aggregating data based on chunks of time which may overlap or vary in duration depending on the method. In Azure Stream Analytics, the options are tumbling window, hopping window, sliding window, and session window.

103
Q

Transparent data encryption (TDE)

A

The feature in Microsoft SQL Server (all flavors) that encrypts data at rest.

104
Q

U-SQL

A

A language that combines declarative SQL with imperative C# for processing unstructured data by applying schema on read and inserting custom logic and UDFs.

105
Q

Vertical partitioning

A

A strategy in which each partition holds a subset of the fields/columns.

106
Q

Watermark

A

In data stream processing, a technique to deal with late-arriving events by discarding events that arrive too long after they were produced.

107
Q

Worker node

A

In Massively Parallel Processing (MPP), a node that works on a chunk of data. In Azure Synapse Analytics, this is called a compute node.