db-900 core data concepts Flashcards

db-900 azure data fundamentals

1
Q

What are the three ways you can categorize data?

A

Structured
Semi–structured
Unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is tabular data?

A

data that is stored as rows and columns, in one or more ‘table’.
A row represents an entity and a column represents an attribute of that entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What makes data ‘structured’?

A

it is tabular and adheres to a fixed schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes data ‘semi–structured’?

A

it contains entities which have some regularly occuring attributes but there is variation. Sometimes those attributes are missing or there are multiple values for a givent attribute, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an example of a format that is useful for ‘semi structured’ data?

A

JSON – because it allows you to define fields for an entity but does not need to adhere to a predefined schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some examples of ‘unstructured’ data?

A

audio, video, and images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two broad categories of data stores?

A

File stores and Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some common ways to store files?

A

BLOB, CSV, XML, JSON, and optimized file formats like: Avro, ORC, and Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is XML?

A

a human readable semistructured format that stores data in tags.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is replacing XML?

A

JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the best format for storing large objects like videos, audio, and images?

A

BLOB
Binary Large Object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is file storage different from a database?

A

The difference is that one deals with records rather than files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is NoSQL?

A

databases that are not relational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 4 common types of non–relational databases?

A

Key–value
Document
Column Family
Graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In a key–value database, what format does the value have to be in?

A

In this type of database, it doesn’t matter what the format of the value is. It can be numerical, text, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In a document database, what format does the value have to be in?

A

JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the two types of data processing?

A

Transactional and Analytical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is OLTP?

A

Online Transaction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does OLTP track?

A

Transactions, which are often CRUD operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does a transaction ensure?

A

ACIDity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does ACID stand for?

A

Atomicity
Consistency
Isolation
Durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is atomicity?

A

All sub–components of a transaction must succeed in order for the transaction to take place. It is binary, either all of it completes or none of it does.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you know a transaction is consistent?

A

When a transaction takes the database from one valid state to another valid state. If you were to transfer funds from one account to another, the total number of funds remains the same, because it is subtracted from one and added to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you know your transactions are ‘isolated’?

A

When the transaction does not interfere with another transaction. If I run a transaction to transfer funds from one account to another, and I also run a transaction to get the number of funds from all accounts, that second transaction should not get one account total before the transfer and one account total after the transfer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What proves that a transaction was durable?

A

Once the transaction is committed, it persists. If the database it turned on and off, the change remains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a data lake?

A

It is used to process large volumes of file based data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is an olap model?

A

Online Analytical Processing model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

ETL takes the data from where to where?

A

From operational data to the data lake, warehouse, or lakehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a data warehouse?

A

A database optimized for analytics queries (read operations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does CRUD stand for?

A

Create
Retrieve
Update
Delete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a lakehouse

A

combines the flexible and scalable storage with relational querying semantics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What kind of denormalization takes place when oltp data is transferred to a lakehouse?

A

Relational Data will contain duplicate data across rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the 3 main roles in Data?

A

Database Administrator
Data Engineer
Data Analyst

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does a database administrator do?

A

They are responsible for the design, implementation, and maintenance of databases. They do things like update the databases and manage permissions, and they are responsible for the performance and reliability of the databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What does a data engineer do?

A

They are responsible for building data workloads for databases and file stores that take transactional data and make them available for analytics. They own the management and monitoring of data pipelines to ensure that data loads perform as expected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does a data analyst do?

A

They investigate and transform data into reports and visualizations to provide insights for valuable business questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is Azure SQL?

A

A group of relational database solutions built on the SQL Server engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is Azure?

A

Azure is a collection of cloud based IT solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is Azure SQL Database?

A

Fully managed Platform-as-a-service product, which provides the least flexible configuration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is Azure SQL Managed Instance?

A

A hosted instance of SQL Server which provides automated maintenance, allowing more configuration flexibility than SQL DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is Azure SQL VM?

A

A virtual machine with SQL Server installed, allowing for the maximum amount of configuration, also the most aount of responsibility for the DBA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What Azure products are offered for open-source relational databases?

A

MySQL
MariaDB
Postgres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is Azure Cosmos?

A

A global scale non-relational (noSQL) database which supports storing documents as JSONs, key-value pairs, column family tables, and graphs.

Sometimes DBAs have to manage this, but usually the software engineers do.

Often Data Engineers will need to extract data from here for a data lakehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is Azure Storage?

A

A cloud service that allows you to store data in BLOB containers, file shares, and tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What would a data engineer do with Azure Storage?

A

They would likely use it as a data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is Azure Data Factory?

A

A service to define and schedule data pipelines to transfer and transform data. It can be integrated with other Azure products

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What would a data engineer do with Azure Data Factory?

A

They would use it to build ETL pipelines that take operational data and populate data warehouses for analytics solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is Azure Synapse Analytics?

A

Comprehensive PaaS for analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What does Synapse Analytics include?

A

Pipelines
SQL
Apache Spark
Synapse Analytics Data Explorer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is Synapse Analytics pipelines?

A

Same technology as Azure Data Factory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is SQL?

A

a highly scalable SQL database engine, optimized for data warehouse workloads (read queries)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is Apache Spark?

A

An open source distributed data processing system that allows for the integration of APIs using python, sql, java, and scala

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is Synapse Analytics Data Explorer?

A

Uses the Kusto Query Language to provide extremely fast analytics processing optimized for realtime telemetry and log data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

what can data engineers use Azure Synapse Analytics for?

A

They will use it to build comprehensive data analytics solutions for ingest pipelines, lake storage, and warehouse storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

what can data analysts use Azure Synapse Analytics for?

A

They can use sql and spark through interactive notebooks and integrate with Azure Machine Learning and Power BI to create models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is Azure Databricks?

A

An Azure integrated version of a popular platform which combines Apache Spark and SQL database semantics for large scale analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

what can data analysts use Azure Databricks for?

A

They can use the native notebook support to provide browser friendly data analysis

58
Q

what can data engineers use Azure Databricks for?

A

They’ll use it to create analytical data stores

59
Q

What is Azure HDInsight?

A

This provides Azure hosted clusters for apache technologies

60
Q

What is Apache Hadoop?

A

Write map-reduce jobs in Java or Apache Hive to process large volumes of data

61
Q

What is Apache HBase?

A

Query NoSQL data at a large scale with this

62
Q

What is Apache Kafka?

A

a message broker for data stream processing

63
Q

Data engineers can use Azure HDInsight for what?

A

They can use this to support big data processing jobs that use multiple Apache technologies

64
Q

What is Azure Stream Analytics?

A

Captures a stream of data, applies queries/transformations to it, writes the results for analytics or further processing

65
Q

What can data engineers do with Azure Stream Analytics?

A

They can use this to write ETL pipelines for analytical data stores

66
Q

What is Azure Data Explorer?

A

query log and telemetry data fast with this standalone version of the Synapse product

67
Q

Data analysts can use Azure Data Explorer for what?

A

They can easily analyze timestamped log data

68
Q

What is Microsoft Purview?

A

Enterprise solution for governance and discoverability, helping people find the data they need

69
Q

What is Microsoft Fabric?

A

SaaS lakehouse platform that includes:
ETL
lakehouse analytics
warehouse analytics
data science and machine learning
realtime analytics
data visualization
data governance and management

70
Q

Data engineers can use Microsoft Purview for what?

A

They will enforce data governance and ensure integrity of data

71
Q

Structured
Semi–structured
Unstructured

A

What are the three ways you can categorize data?

72
Q

data that is stored as rows and columns, in one or more ‘table’.
A row represents an entity and a column represents an attribute of that entity.

A

What is tabular data?

73
Q

it is tabular and adheres to a fixed schema.

A

What makes data ‘structured’?

74
Q

it contains entities which have some regularly occuring attributes but there is variation. Sometimes those attributes are missing or there are multiple values for a givent attribute, etc

A

What makes data ‘semi–structured’?

75
Q

JSON – because it allows you to define fields for an entity but does not need to adhere to a predefined schema.

A

What is an example of a format that is useful for ‘semi structured’ data?

76
Q

audio, video, and images

A

What are some examples of ‘unstructured’ data?

77
Q

File stores and Databases

A

What are the two broad categories of data stores?

78
Q

BLOB, CSV, XML, JSON, and optimized file formats like: Avro, ORC, and Parquet

A

What are some common ways to store files?

79
Q

a human readable semistructured format that stores data in tags.

A

What is XML?

80
Q

JSON

A

what is replacing XML?

81
Q

BLOB
Binary Large Object

A

What is the best format for storing large objects like videos, audio, and images?

82
Q

The difference is that one deals with records rather than files

A

How is file storage different from a database?

83
Q

databases that are not relational

A

What is NoSQL?

84
Q

Key–value
Document
Column Family
Graph

A

What are the 4 common types of non–relational databases?

85
Q

In this type of database, it doesn’t matter what the format of the value is. It can be numerical, text, etc

A

In a key–value database, what format does the value have to be in?

86
Q

JSON

A

In a document database, what format does the value have to be in?

87
Q

Transactional and Analytical

A

What are the two types of data processing?

88
Q

Online Transaction Processing

A

What is OLTP?

89
Q

Transactions, which are often CRUD operations

A

What does OLTP track?

90
Q

ACIDity

A

What does a transaction ensure?

91
Q

Atomicity
Consistency
Isolation
Durability

A

What does ACID stand for?

92
Q

All sub–components of a transaction must succeed in order for the transaction to take place. It is binary, either all of it completes or none of it does.

A

What is atomicity?

93
Q

When a transaction takes the database from one valid state to another valid state. If you were to transfer funds from one account to another, the total number of funds remains the same, because it is subtracted from one and added to another

A

How do you know a transaction is consistent?

94
Q

When the transaction does not interfere with another transaction. If I run a transaction to transfer funds from one account to another, and I also run a transaction to get the number of funds from all accounts, that second transaction should not get one account total before the transfer and one account total after the transfer.

A

How do you know your transactions are ‘isolated’?

95
Q

Once the transaction is committed, it persists. If the database it turned on and off, the change remains.

A

What proves that a transaction was durable?

96
Q

It is used to process large volumes of file based data

A

What is a data lake?

97
Q

Online Analytical Processing model

A

What is an olap model?

98
Q

From operational data to the data lake, warehouse, or lakehouse

A

ETL takes the data from where to where?

99
Q

A database optimized for analytics queries (read operations)

A

What is a data warehouse?

100
Q

Create
Retrieve
Update
Delete

A

What does CRUD stand for?

101
Q

combines the flexible and scalable storage with relational querying semantics

A

What is a lakehouse

102
Q

Relational Data will contain duplicate data across rows.

A

What kind of denormalization takes place when oltp data is transferred to a lakehouse?

103
Q

Database Administrator
Data Engineer
Data Analyst

A

What are the 3 main roles in Data?

104
Q

They are responsible for the design, implementation, and maintenance of databases. They do things like update the databases and manage permissions, and they are responsible for the performance and reliability of the databases.

A

What does a database administrator do?

105
Q

They are responsible for building data workloads for databases and file stores that take transactional data and make them available for analytics. They own the management and monitoring of data pipelines to ensure that data loads perform as expected.

A

What does a data engineer do?

106
Q

They investigate and transform data into reports and visualizations to provide insights for valuable business questions

A

What does a data analyst do?

107
Q

A group of relational database solutions built on the SQL Server engine

A

What is Azure SQL?

108
Q

Azure is a collection of cloud based IT solutions

A

What is Azure?

109
Q

Fully managed Platform-as-a-service product, which provides the least flexible configuration

A

What is Azure SQL Database?

110
Q

A hosted instance of SQL Server which provides automated maintenance, allowing more configuration flexibility than SQL DB

A

What is Azure SQL Managed Instance?

111
Q

A virtual machine with SQL Server installed, allowing for the maximum amount of configuration, also the most aount of responsibility for the DBA

A

What is Azure SQL VM?

112
Q

MySQL
MariaDB
Postgres

A

What Azure products are offered for open-source relational databases?

113
Q

A global scale non-relational (noSQL) database which supports storing documents as JSONs, key-value pairs, column family tables, and graphs.

Sometimes DBAs have to manage this, but usually the software engineers do.

Often Data Engineers will need to extract data from here for a data lakehouse

A

What is Azure Cosmos?

114
Q

A cloud service that allows you to store data in BLOB containers, file shares, and tables

A

What is Azure Storage?

115
Q

They would likely use it as a data lake

A

What would a data engineer do with Azure Storage?

116
Q

A service to define and schedule data pipelines to transfer and transform data. It can be integrated with other Azure products

A

What is Azure Data Factory?

117
Q

They would use it to build ETL pipelines that take operational data and populate data warehouses for analytics solutions

A

What would a data engineer do with Azure Data Factory?

118
Q

Comprehensive PaaS for analytics

A

What is Azure Synapse Analytics?

119
Q

Pipelines
SQL
Apache Spark
Synapse Analytics Data Explorer

A

What does Synapse Analytics include?

120
Q

Same technology as Azure Data Factory

A

What is Synapse Analytics pipelines?

121
Q

a highly scalable SQL database engine, optimized for data warehouse workloads (read queries)

A

What is SQL?

122
Q

An open source distributed data processing system that allows for the integration of APIs using python, sql, java, and scala

A

What is Apache Spark?

123
Q

Uses the Kusto Query Language to provide extremely fast analytics processing optimized for realtime telemetry and log data

A

What is Synapse Analytics Data Explorer?

124
Q

They will use it to build comprehensive data analytics solutions for ingest pipelines, lake storage, and warehouse storage

A

what can data engineers use Azure Synapse Analytics for?

125
Q

They can use sql and spark through interactive notebooks and integrate with Azure Machine Learning and Power BI to create models

A

what can data analysts use Azure Synapse Analytics for?

126
Q

An Azure integrated version of a popular platform which combines Apache Spark and SQL database semantics for large scale analytics

A

What is Azure Databricks?

127
Q

They can use the native notebook support to provide browser friendly data analysis

A

what can data analysts use Azure Databricks for?

128
Q

They’ll use it to create analytical data stores

A

what can data engineers use Azure Databricks for?

129
Q

This provides Azure hosted clusters for apache technologies

A

What is Azure HDInsight?

130
Q

Write map-reduce jobs in Java or Apache Hive to process large volumes of data

A

What is Apache Hadoop?

131
Q

Query NoSQL data at a large scale with this

A

What is Apache HBase?

132
Q

a message broker for data stream processing

A

What is Apache Kafka?

133
Q

They can use this to support big data processing jobs that use multiple Apache technologies

A

Data engineers can use Azure HDInsight for what?

134
Q

Captures a stream of data, applies queries/transformations to it, writes the results for analytics or further processing

A

What is Azure Stream Analytics?

135
Q

They can use this to write ETL pipelines for analytical data stores

A

What can data engineers do with Azure Stream Analytics?

136
Q

query log and telemetry data fast with this standalone version of the Synapse product

A

What is Azure Data Explorer?

137
Q

They can easily analyze timestamped log data

A

Data analysts can use Azure Data Explorer for what?

138
Q

Enterprise solution for governance and discoverability, helping people find the data they need

A

What is Microsoft Purview?

139
Q

SaaS lakehouse platform that includes:
ETL
lakehouse analytics
warehouse analytics
data science and machine learning
realtime analytics
data visualization
data governance and management

A

What is Microsoft Fabric?

140
Q

They will enforce data governance and ensure integrity of data

A

Data engineers can use Microsoft Purview for what?