CIS275 - Chapter 7: Database Architecture Flashcards

1
Q

describes the components of a computer system and the relationships between components.

A

Architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MySQL components are organized in four layers:

_____ interact directly with database users and administrators, and send queries to the query processor.

A

Tools

The tools layer includes Connectors and APIs, Workbench, and utility programs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

MySQL components are organized in four layers:

The _____ manages connections from multiple users and compiles queries into low-level instructions for the storage engine.

A

query processor

The query processor layer has two main functions: manage connections and compile queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MySQL components are organized in four layers:

The _____ executes instructions, manages indexes, and interacts with the file system. Some storage engines support database transactions, described elsewhere in this material.

A

storage engine

The storage engine layer has two main functions: transaction management and data access.

Transaction management includes the concurrency system, recovery system, and lock manager. These components ensure all transactions are atomic, consistent, isolated, and durable, as explained elsewhere in this material.

The data access component communicates with the file system and translates table, column, and index reads into block addresses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

MySQL components are organized in four layers:

The _____ contains system and user data, such as log files, tables, and indexes.

A

file system

The file system layer consists of data stored on storage media and organized in files. The file system contains three types of data for each database: user data, log files, and a data dictionary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

_____ include approximately 30 tools, grouped in five categories: installation, client, administrative, developer, and miscellaneous tools.

A

Utility programs

Most utility programs are intended for database administrators or programmers. Example functions include:

Upgrade existing databases to a new MySQL release

Backup databases

Import data to databases

Inspect log files

Administer database servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a link between tools and the query processor.

A

connection

Each connection specifies a database name, server address, logon name, and password. The connection manager creates connections and manages communications between tools and the query parser.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

a detailed, low-level sequence of steps that specify exactly how to process a query.

A

execution plan

  1. The query selects employee name and department name for employees that work in Illinois.
  2. Step 1: The plan retrieves Illinois employees using an index on State.
  3. Step 2: The plan sorts selected employees by DeptCode.
  4. Step 3: The plan retrieves matching departments using a table scan.
  5. Step 4: The plan sorts the selected departments by Code.
  6. Step 5: The plan merges the two result tables using the join technique called ‘sort-merge’.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The query processor generates an execution plan in two steps:

The _____ checks each query for syntax errors and converts valid queries to an internal representation.

A

query parser

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The query processor generates an execution plan in two steps:

The _____ reads the internal representation, generates alternative execution plans, estimates execution times, and selects the fastest plan.

A

query optimizer

Estimates are based on heuristics and statistics about data, like the number of rows in each table and the number of values in each column. These statistics are maintained in the data dictionary, described below.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For optimal performance, the query processor layer has a _____ that stores reusable information in main memory.

A

cache manager

Ex: The cache manager retains execution plans for queries that are submitted multiple times. If data used in repeated queries does not change, the cache manager may also save query results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To reduce data access time, the _____ retains data blocks from the file system for possible reuse.

A

buffer manager

The data blocks are retained in an area of main memory called the buffer. Ex: If queries frequently access department data, the buffer manager may retain some or all blocks of the Department table. The buffer manager is similar to the cache manager of the query processor layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The InnoDB buffer manager uses a ____ or _____ algorithm.

A

least recently used or LRU algorithm

The LRU algorithm tracks the time each block was last used and, when space is needed, discards ‘stale’ blocks. If data in a block has been updated, discarded blocks are first saved on disk.

  1. Data in the file system is organized into blocks.
  2. The database reads blocks from the file system and stores blocks in the buffer.
  3. As the database reads new blocks, prior blocks become ‘stale’.
  4. Data block A becomes ‘fresh’ when updated.
  5. Eventually, the buffer fills up.
  6. To make space for block E, stale block B is deleted from the buffer.
  7. Block A has been updated and must be saved to the file system before deletion from the buffer.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A _____ is a directory of tables, columns, keys, indexes, and other objects in a relational database.

A

catalog, also known as a data dictionary

All relational databases contain a catalog. Query processors and storage managers use catalog information when queries are processed and executed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Multiple computers linked by a network are often grouped in layers, called ___, and arranged in a hierarchy.

A

tiers

The term tier refers to either a software or hardware layer. In this material, tier refers to a hardware layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

a multi-tier architecture consisting of web browsers and web servers communicating over the internet:

A

Web architecture

Web browsers, on the top tier, manage user interaction.

Web servers, on a middle tier, generate web pages for display on web browsers and transmit user requests to services running on lower tiers.

Application servers run application software, process user requests, and communicate with databases and other services.

Services, such as database and authentication, comprise the bottom tier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Prior to 1990, most software ran in a _____, consisting of a personal or corporate computer connected directly to monitors.

A

single-tier architecture

Although computers often communicated with each other, the dependencies between applications running on different computers were limited.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Since 1990, complex corporate and government applications have increasingly been implemented in a _____:

A

multi-tier architecture

The top tier consists of computers interacting directly with end-users.

The bottom tier consists of servers managing resources like databases and email.

One or more middle tiers execute a variety of functions, such as user authorization, business logic, and communication with other computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

With _____, a vendor such as Amazon, Microsoft, or Google implements computer services on lower tiers of a web architecture.

A

cloud services

For a fee, cloud services are made available over the internet to customers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

_____, or _____, provides computer processing, memory, and storage media, as if the customer were renting a computer.

A

Infrastructure-as-a-service, or IaaS

Ex: Elastic Compute Cloud, or EC2, from Amazon Web Services offers infrastructure-as-a-service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

_____, or _____, provides tools and services, such as databases, application development tools, and messaging services.

A

Platform-as-a-service, or PaaS

Ex: Azure is Microsoft’s cloud services environment, offering the SQL Database service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

_____, or _____, provides complete applications, usually through web browsers on customer machines.

A

Software-as-a-service, or SaaS

Ex: Salesforce offers sales management software, and Google offers document processing applications like Docs, Sheets, and Pages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

a software layer that emulates a complete, independent computing environment.

A

virtual machine, or VM

Usually cloud services are offered on virtual machines.

Multiple virtual machines can run on one computer , enabling cloud providers to support many customers on the same machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

a database offered as a PaaS cloud service.

A

cloud database

Most databases are now available either on-premise or as a cloud service, but cloud database use is growing rapidly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Prior to 2000, most commercial software was _____, or installed and run on customer computers.

A

on-premise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q
A

Expected:

(a) 1, 2, 3, 4, 5, On-premise
(b) 1, SaaS
(c) 1, 2, 5, laaS
(d) 1, 2, PaaS
(a) All layers are installed, administered, and run on customer computers, so on-premise.
(b) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(c) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(d) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q
A

Expected:

(a) 3, SaaS
(b) 1, 3, 5, laaS
(c) 1, 2, 3, 4, 5, On-premise
(d) 3, 5, PaaS
(a) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(b) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(c) All layers are installed, administered, and run on customer computers, so on-premise.
(d) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q
A

Expected:

(a) 2, 4, laaS
(b) 2, 4, 5, PaaS
(c) None, On-premise
(d) 1, 2, 4, 5, SaaS
(a) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(b) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.
(c) All layers are installed, administered, and run on customer computers, so on-premise.
(d) The service offers an application to customers. Customers access the application via a web browser, so SaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q
A

Expected:

(a) 1, 3, laaS
(b) None, On-premise
(c) 1, 2, 3, 5, SaaS
(d) 1, 3, 5, PaaS
(a) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(b) All layers are installed, administered, and run on customer computers, so on-premise.
(c) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(d) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q
A

Expected:

(a) 4, 5, laaS
(b) 2, 4, 5, PaaS
(c) 2, 3, 4, 5, SaaS
(d) None, On-premise
(a) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(b) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.
(c) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(d) All layers are installed, administered, and run on customer computers, so on-premise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q
A

Expected:

(a) 1, 2, 5, laaS
(b) 1, 2, PaaS
(c) 1, 2, 3, 4, 5, On-premise
(d) 1, SaaS
(a) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(b) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.
(c) All layers are installed, administered, and run on customer computers, so on-premise.
(d) The service offers an application to customers. Customers access the application via a web browser, so SaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q
A

Expected:

(a) 1, 2, 3, 4, 5, On-premise
(b) 1, 2, 4, laaS
(c) 1, SaaS
(d) 1, 2, PaaS
(a) All layers are installed, administered, and run on customer computers, so on-premise.
(b) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(c) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(d) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q
A

Expected:

(a) 3, 4, laaS
(b) None, On-premise
(c) 2, 3, 4, 5, SaaS
(d) 2, 3, 4, PaaS
(a) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(b) All layers are installed, administered, and run on customer computers, so on-premise.
(c) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(d) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q
A

Expected:

(a) 1, 2, 4, 5, SaaS
(b) None, On-premise
(c) 1, 4, 5, PaaS
(d) 1, 4, laaS
(a) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(b) All layers are installed, administered, and run on customer computers, so on-premise.
(c) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.
(d) The service offers virtual machines and all other layers are managed by the customers, so IaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q
A

Expected:

(a) None, On-premise
(b) 2, 5, laaS
(c) 1, 2, 5, PaaS
(d) 1, 2, 3, 5, SaaS
(a) All layers are installed, administered, and run on customer computers, so on-premise.
(b) The service offers virtual machines and all other layers are managed by the customers, so IaaS.
(c) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.
(d) The service offers an application to customers. Customers access the application via a web browser, so SaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q
A

Expected:

(a) 2, 5, PaaS
(b) 1, 2, 3, 4, 5, On-premise
(c) 2, SaaS
(d) 2, 3, 5, laaS
(a) The service offers computer services and tools running within virtual machines. Any application run on the service is managed by the customers and accessed via a web browser, so PaaS.
(b) All layers are installed, administered, and run on customer computers, so on-premise.
(c) The service offers an application to customers. Customers access the application via a web browser, so SaaS.
(d) The service offers virtual machines and all other layers are managed by the customers, so IaaS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

A _____ consists of multiple processors managed by a single operating system instance.

A

parallel computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

In a _____, processors share the same memory and storage media.

A

shared memory computer

Shared memory is optimal for parallel processing against a common data set in a single memory space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

In a _____, processors share storage media only. Each processor has private memory.

A

shared storage computer

shared storage and shared nothing scale to more processors, since processors do not contend for the same memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

In a _____, processors share neither memory nor storage media.

A

shared nothing computer

shared storage and shared nothing scale to more processors, since processors do not contend for the same memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

A _____ consists of cables extending over a small area, typically within one facility.

A

local area network

Local area networks usually use the Ethernet communication protocol.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

A _____ spans multiple facilities in different geographic locations, separated by many miles.

A

wide area network

Wide area networks may communicate via cables, satellite, or telephone lines, often using internet communication protocols.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

one of a group of computers connected by either a local or wide area network.

A

node

A node is a single processor with associated memory and storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

a group of nodes connected by a local area network, managed by separate operating system instances, and coordinated by specialized _____ management software.

A

cluster

A cluster is similar to a parallel computer. Both can execute program instructions in parallel on multiple processors. Both can share storage or share nothing. Computers in a cluster cannot share memory, however, since local area networks are too slow to support memory access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

A _____ runs on a parallel computer or cluster.

A

parallel database

57
Q

A _____ runs on multiple computers connected by a wide area network.

A

distributed database

  1. A node is a single processor with associated memory and storage.
  2. A distributed database consists of nodes connected by a wide area network.
  3. From the perspective of database users and programmers, individual nodes of a distributed database are not visible.
58
Q
A
59
Q
A
60
Q

A _____ updates data on multiple nodes of a distributed database.

A

distributed transaction

In a distributed transaction, either all nodes or no nodes must be successfully updated.

Updates in a distributed transaction are synchronous, since the updates occur at the same time from the perspective of the database user.

61
Q

Databases commonly implement distributed transactions with a technique called _____.

A

two-phase commit

The two-phase commit has four steps:

In phase 1, a central transaction coordinator notifies all participating nodes of the required updates.

Participating nodes receive the notification, store the update in a local log, and send a confirmation message to the transaction coordinator. Participating nodes do not yet commit the update to the database.

Phase 2 begins when the transaction coordinator receives confirmation from all participating nodes. The transaction coordinator now instructs all nodes to commit.

Participating nodes receive the commit message, commit the update to the database, and notify the transaction coordinator of success.

The two-phase commit must account for the following failure scenarios:

In step 2, if the transaction coordinator does not receive confirmation from all nodes within a fixed time period, the transaction coordinator instructs participating nodes to roll back the update.

In step 4, if a node becomes unavailable and fails to notify the transaction coordinator of success, the transaction coordinator resends the commit message until the node responds.

The two-phase commit ensures updates are applied to either all nodes or no nodes. In the first failure scenario, the transaction rolls back, and no updates are applied. In the second failure scenario, the transaction commits, and all updates are applied.

62
Q

A _____ updates data on a single node of a distributed database.

A

local transaction

Updates in separate local transactions are asynchronous.

Databases that use local rather than distributed transactions are called eventually consistent.

63
Q
A
64
Q
A
65
Q

A _____ database conforms to all rules at all times.

A

consistent database

In a distributed database, a rule may govern data on multiple nodes. Ex: Foreign key values on one node must match primary key values on another node. Ex: Copies of data on multiple nodes must be identical.

66
Q

In an _____ database, ‘live’ nodes must respond to queries at all times.

A

available database

A ‘dead’ node may be unresponsive, but ‘live’ nodes must respond regardless of the state of other nodes.

67
Q

A _____ forms when a network error prevents nodes from communicating.

A

network partition

A distributed database occasionally experiences network partitions since nodes are connected by wide area networks that occasionally fail.

68
Q

A _____ database continues to function when a network partition occurs.

A

partition-tolerant database

69
Q

The _____ states that a distributed database cannot simultaneously be Consistent, Available, and Partition-tolerant.

A

CAP theorem

A distributed database can guarantee any two, but not all three, of these properties.

  1. A distributed database has four nodes. Each node contains a copy of the Employee table.
  2. The wide area network fails, creating a partition. Nodes A and B cannot communicate with nodes C and D.
  3. Nodes A and B are updated with local transactions. The partition prevents updates to nodes C and D.
  4. Node D receives an Employee query.

If node responds, the result is inconsistent. If node does not respond, the database is unavailable.

70
Q
A
71
Q
A

Expected:

(a) Shared storage
(b) Shared memory
(c) Shared nothing
(a) describes a shared storage configuration because processors share storage media but not memory.
(b) describes a shared memory configuration because processors share the same memory and storage media.
(c) describes a shared nothing configuration because processors share neither memory nor storage media.

72
Q
A

Expected:

(a) Shared storage
(b) Shared nothing
(c) Shared memory
(a) describes a shared storage configuration because processors share storage media but not memory.
(b) describes a shared nothing configuration because processors share neither memory nor storage media.
(c) describes a shared memory configuration because processors share the same memory and storage media.

73
Q
A

Expected:

(a) Distributed database
(b) Distributed database In both scenarios
(a) and
(b) , data is stored in multiple computers connected by a wide area network, so a distributed database is required.

74
Q
A

Expected:

(a) Node C sends commit messages
(b) Update is made successfully In scenario
(a) , phase 1 completes successfully, so node C sends commit messages. In scenario
(b) , the transaction was committed successfully, so update is made.

75
Q
A

Expected:

(a) Node A sends commit messages
(b) Node A sends commit messages In scenario
(a) , node C fails in phase 2, so node A continues to send commit messages until node C is available and responds. In scenario
(b) , phase 1 completes successfully, so node A sends commit messages.

76
Q
A

Expected:

(a) Node B sends commit messages
(b) Node B sends commit messages In scenario
(a) , phase 1 completes successfully, so node B sends commit messages. In scenario
(b) , node A fails in phase 2, so node B continues to send commit messages until node A is available and responds.

77
Q
A

Expected:

(a) Node D cancels the transaction
(b) Node D sends commit messages In scenario
(a) , node B fails during phase 1, so node D cancels the transaction, and no participating nodes commit the result. In scenario
(b) , phase 1 completes successfully, so node D sends commit messages.

78
Q
A

Expected:

(a) Node D sends commit messages
(b) Node D sends commit messages In scenario
(a) , phase 1 completes successfully, so node D sends commit messages. In scenario
(b) , node A fails in phase 2, so node D continues to send commit messages until node A is available and responds.

79
Q
A

Expected:

(a) Partition-tolerant
(b) Available In scenario
(a) , the database continues to function when a network partition occurs, so is partition-tolerant. In scenario
(b) , the “live” nodes of the database respond regardless of the state of other nodes, so is available.

80
Q

a copy of an entire database, a table, or a subset of table data.

A

replica

81
Q

A _____ maintains two or more replicas on separate storage devices.

A

replicated database

Replicated databases have several major advantages:

High availability. If one storage device fails, the database routes queries to a replica on another storage device. In general, if a database maintains N replicas, the database can survive simultaneous failure of N-1 storage devices.

Fast concurrent reads. Concurrent queries can read separate replicas without interfering with each other. One large query can be decomposed into smaller queries that read separate replicas in parallel.

Local reads. In a distributed database, reads can be executed locally, eliminating network delays and outages.

  1. The Employee table is stored on node A of a distributed database and not replicated.
  2. An Employee query at node C generates a round-trip over the wide area network. The query is relatively slow.
  3. If a network partition occurs, the query cannot be processed.
  4. If the Employee table is replicated, the query is local, relatively fast, and immune to a network partition.
82
Q
A
83
Q

Some storage devices, called _____ , manage replicas internally, without database intervention.

A

storage arrays

84
Q

The _____ or _____ replica update technique designates one node as primary.

A

primary/secondary or master/slave technique

All updates are first applied to the primary node in local transactions. Secondary nodes are updated after the primary node commits, with independent local transactions. If the primary node fails, the database automatically designates a new primary node to ensure continued availability.

  1. The Employee table is replicated on three nodes.
  2. Node A is primary.
  3. Updates are applied to the primary node only, so the database is temporarily inconsistent.
  4. After update commits on primary node, secondary nodes are updated in local transactions.
85
Q

The _____ technique applies updates to any node in a group.

A

group replication technique

Prior to committing, a node broadcasts transaction information to other nodes, which look for conflicts with concurrent transactions. If any node detects a conflict, an algorithm determines which transaction commits and which rolls back. This algorithm may be simple, such as the transaction that commits first wins, or complex. If a network partition occurs and nodes cannot communicate, processing is temporarily suspended.

  1. The Employee table is replicated on three nodes.
  2. Any node in the group can be updated.
  3. Prior to commit, the updated node sends transaction information to all other nodes.
  4. All nodes detect a conflict. Since Zoe update committed prior to Mali update, Zoe wins.
86
Q
A
87
Q

a directory of information describing database objects such as tables, columns, keys, and indexes.

A

catalog

Catalog information is necessary to process queries and access data. Each node in a distributed database can process queries and therefore requires access to the catalog.

88
Q

In a _____, the entire catalog resides on a single node.

A

central catalog

Storing the catalog on a single node is relatively easy to manage. However, query processing at remote nodes must access the catalog via a wide area network, which may be slow or unreliable. Furthermore, query processing at all nodes interact with the central catalog, which may become a bottleneck.

89
Q

In a _____, a copy of the catalog resides on each node.

A

replicated catalog

Most queries are fast and reliable since all catalog data is available locally. However, statements that update the catalog, such as CREATE, ALTER, and DROP, must update all replicas. Updating replicas generates increased network traffic and, if executed in a distributed transaction, fails when any replica is unavailable.

  1. A replicated catalog is stored in each node of a distributed database.
  2. The catalog contains a table called ‘Tables’, with one row describing each table in the database.
  3. The administrator creates a table on node A. The syntax for assigning tables to nodes varies and is not shown.
  4. A transaction on node A creates the table and inserts a row into Tables.
  5. After node A transaction commits, separate transactions update node B and C catalogs.
90
Q
A
91
Q

Organizations use _____ to conduct daily business functions.

A

operational data

Ex: Sales invoices, student test scores, and driving violation records are operational data.

Operational data changes in real time as business functions are executed.

Most operational data is detailed, reflecting individual transactions.

Most operational databases are designed for a specific business function. Consequently, operational databases supporting different business functions are often incompatible.

Many operational databases are concerned primarily with current data.

Ex: Operational data may include active employees only. Analytic data may include past employees and illustrate changes in total employment by month.

92
Q

Organizations use _____ to understand, manage, and plan the business.

A

analytic data

Ex: Sales totals by region, average student grades over time, and driving violation counts by ZIP code are analytic data.

Analytic data is sometimes called reporting data or decision support data.

Analytic data is updated at fixed intervals, often daily or weekly, so that reports and summaries always refer to a known time.

Analytic data is often summarized by time period, business unit, geography, and other business dimensions.

Analytic databases combine data from many business functions in an integrated, enterprise-wide view of data, with standard formats, data types, and keys across all tables.

Analytic databases often track trends over time and therefore usually contain current and historic data. Ex: Operational data may include active employees only. Analytic data may include past employees and illustrate changes in total employment by month.

Because of the above differences, operational and analytic data are often maintained in separate databases with different designs.

93
Q
A
94
Q

a separate database optimized for analytics rather than operations.

A

data warehouse

A data warehouse consists of data extracted from operational databases and restructured to support analytic queries. Data is usually extracted periodically, at a fixed time, so that data in the warehouse has a known reference time. Data is extracted during times of low database use to minimize impact on operational queries.

95
Q

a data warehouse designed for a specific business area, such as sales, human resources, or product development.

A

data mart

Since data marts have smaller scope than a data warehouse, data marts are easier to build and maintain. A data mart can be derived directly from operational databases or indirectly from a data warehouse.

96
Q
  1. Different business functions use separate operational databases with incompatible keys and data formats.
  2. Periodically, data is extracted from operational databases, restructured, and loaded into a data warehouse.
  3. Data in the warehouse is integrated, with compatible keys and a standard format.
A
  1. Data marts support specific areas of data and can be derived from the data warehouse.
  2. Alternatively, data marts can be derived directly from operational databases.
97
Q
A
98
Q

This five-step process is commonly referred to as the _____, or ____, process.

  1. Extract data from operational databases into a temporary database, called a ‘staging area’. Since the data warehouse already contains data from the prior period, only data that has changed since the prior period is extracted.
  2. Cleanse data to eliminate errors, unusual spellings and incorrect data. Ex: Apply standard abbreviations to addresses, such as RD for road and AVE for avenue. Ex: Abbreviate middle name to the first initial.
  3. Integrate data into a uniform structure. Ex: Convert all length data to the metric system. Ex: Replace incompatible primary and foreign keys with consistent values.
  4. Restructure data into a design optimized for analytic queries.
  5. Load data to the data warehouse.
A

extract-transform-load, or ETL

Since the ETL process is time-consuming and difficult to automate, many organizations use special software products, called ETL tools, to minimize programming.

99
Q
  1. Data is extracted from an operational database containing mailing addresses consisting of five lines of text. LINE5 contains an error - the state should be Texas.
  2. Data is unpacked into separate fields.
  3. When cleansing data, standard abbreviations replace source data.
A
  1. Postal code 75001 and city Dallas are in Texas, not Tennessee.
  2. Cleansed data is loaded to data warehouse tables. If necessary, new primary and foreign keys values are generated.
100
Q
A
101
Q
A

Expected:
(a) Detailed
(b) Enterprise-wide
Analytic

(a) Data records individual transactions, so is detailed.
(b) Database integrates data from several operational databases, so is enterprise-wide. Data is detailed and enterprise-wide, so is analytic.

102
Q
A

Expected:
(a) Detailed
(b) Volatile
Operational

(a) Data records individual transactions, so is detailed.
(b) Data is updated real-time and not at fixed intervals, so is volatile. Data is detailed and volatile, so is operational.

103
Q
A
104
Q
A

Expected:
(a) Volatile
(b) Detailed
Operational

(a) Data is updated real-time and not at fixed intervals, so is volatile.
(b) Data records individual transactions, so is detailed. Data is volatile and detailed, so is operational.

105
Q
A

Expected:
(a) Summary
(b) Historical
Analytic

(a) Data summarizes many database rows in a few data points, so is summary.
(b) Data captures both current and prior information, so is historical. Data is summary and historical, so is analytic.

106
Q
A

Expected:
(a) Summary
(b) Enterprise-wide
Analytic

(a) Data summarizes many database rows in a few data points, so is summary.
(b) Database integrates data from several operational databases, so is enterprise-wide. Data is summary and enterprise-wide, so is analytic.

107
Q
A

Expected:

(a) Database design
(b) Interference
(c) Reference time

(a) The report runs slowly at all times, demonstrating query performance problems regardless of the database load. Thus, the problem is caused by the database design.
(b) Data is generated at different time zones, so heavy loads on the call database may interfere with the report. Thus, the problem is caused by interference.
(c) The operational database changed between the two times, causing the same query to generate different results. Thus, the problem is caused by reference time.

108
Q
A

Expected:

(a) Database design
(b) Interference
(c) Reference time

(a) The report runs slowly at all times, demonstrating query performance problems regardless of the database load. Thus, the problem is caused by the database design.
(b) Data is generated at different time zones, so heavy loads on the call database may interfere with the report. Thus, the problem is caused by interference.
(c) The operational database changed between the two times, causing the same query to generate different results. Thus, the problem is caused by reference time.

109
Q
A

Expected:

(a) Database design
(b) Reference time
(c) Interference

(a) The report runs slowly at all times, demonstrating query performance problems regardless of the database load. Thus, the problem is caused by the database design.
(b) The operational database changed between the two times, causing the same query to generate different results. Thus, the problem is caused by reference time.
(c) Data is generated at different time zones, so heavy loads on the call database may interfere with the report. Thus, the problem is caused by interference.

110
Q
A

Expected:

(a) Reference time
(b) Interference
(c) Database design

(a) The operational database changed between the two times, causing the same query to generate different results. Thus, the problem is caused by reference time.
(b) Data is generated at different time zones, so heavy loads on the call database may interfere with the report. Thus, the problem is caused by interference.
(c) The report runs slowly at all times, demonstrating query performance problems regardless of the database load. Thus, the problem is caused by the database design.

111
Q
A

Expected:

(a) Integrate
(b) Extract
(c) Cleanse
(d) Load
(e) Restructure

(a) Data has incompatible or missing keys and creating uniform primary and foreign keys is needed, so step is “Integrate”.
(b) Data is read from an operational system and written to a temporary database, so step is “Extract”.
(c) Errors in operational data are corrected and converted to a standard format, so step is “Cleanse”.
(d) Data warehouse is loaded with cleansed, integrated, and restructured data from the temporary database, so step is “Load”.
(e) Data is converted from a design optimized for operations to a design optimized for analytics, so step is “Restructure”.

112
Q

A _____, also called a _____, consists of fact and dimension tables:

A

dimensional design, also called a star schema

113
Q

A _____ contains numeric data used to measure business performance, such as sales revenue or number of employees.

A

fact table

Each row in a fact table consists of numeric fact columns and foreign keys that reference dimension tables.

The primary key of a fact table is the composite of all foreign keys referencing dimension tables.

114
Q

A _____ contains textual data that describes the fact data, such as product line, organizational unit, and geographical region.

A

dimension table

The primary key of a dimension table is a small, meaningless integer. This reduces the size of fact tables, which often contain millions of rows and many foreign keys referencing dimension tables. Since meaningless primary keys never change, the corresponding foreign keys also never change, and the fact table is easy to maintain.

115
Q
  1. The Employee dimension table includes text that describes employees. SocialNumber contains numeric data but is not quantitative.
  2. Additional dimension tables describe corporate jobs, office locations, and dates.
  3. Primary keys of dimension tables are meaningless integers.
  4. The EmployeeFact table contains numeric data about employees.
A
  1. Foreign keys in EmployeeFact reference dimension tables.
  2. A dimension foreign key can appear multiple times in a fact table, with different meanings.
  3. The EmployeeFact primary key is the composite of all foreign keys.
116
Q
A
117
Q

a sequence of columns in which each column has a one-many relationship to the next column.

A

dimension hierarchy

A dimension table usually contains one or more column hierarchies. Ex: The Location table contains CountryCode, State, and PostalCode columns. Each country contains many states, and each state contains many postal codes, so these columns form a hierarchy.

118
Q
A
119
Q
  1. The Date dimension has one row for each day between January 1, 1950 and December 31, 2200.
  2. EmployeeFact contains historical data. StartDateID and EndDateID indicate effective dates of each row.
A
  1. Between July 28, 2009 and November 14, 2012, employee 234 had a salary of $40,000.
  2. On November 14, 2012, the employee received a raise to $59,000.
  3. The salary of 59,000 is current, so EndDateID refers to a distant future date.
120
Q
A
121
Q
A

Expected:
(A) EducationYears
(B) EmployeeFact
(C) Job.JobID
(D) LevelCode
(E) GROUP
(F) State

MAX(EducationYears) computes highest number of years of education.

EmployeeFact table is read, so must be in FROM clause.

EmployeeFact.JobID = Job.JobID connects the EmployeeFact and Job tables.

LevelCode = 3 computes only employees with level code 3.

GROUP BY State computes per state.

122
Q
A

Expected:
(A) BonusAmount
(B) EmployeeFact
(C) Location.LocationID
(D) JobCode
(E) GROUP
(F) State

MAX(BonusAmount) computes highest bonus amount.

EmployeeFact table is read, so must be in FROM clause.

EmployeeFact.LocationID = Location.LocationID connects the EmployeeFact and Location tables.

JobCode = 4 computes only employees with job code 4.

GROUP BY State computes per state.

123
Q
A

Expected:
(A) Date.Year
(B) EducationYears
(C) Date.Year
(D) Year, JobCode

Date.Year in the SELECT clause reports the year.

MAX(EducationYears) computes the highest number of years of education.

Date.Year BETWEEN StartDate.Year AND EndDate.Year specifies the year range.

GROUP BY Year, JobCode makes the query compute the highest number of years of education by year and job code.

124
Q
A

Expected:
(A) Date.Year
(B) PerformanceRating
(C) StartDate.Year
(D) Year, LevelCode

Date.Year in the SELECT clause reports the year.

MAX(PerformanceRating) computes the highest performance rating.

Date.Year BETWEEN StartDate.Year AND EndDate.Year specifies the year range.

GROUP BY Year, LevelCode makes the query compute the highest performance rating by year and level code.

125
Q
A

Expected:
(A) LevelCode
(B) PerformanceRating
(C) Date.Year
(D) Year, LevelCode

LevelCode in the SELECT clause reports the level code.

MAX(PerformanceRating) computes the highest performance rating.

Date.Year BETWEEN StartDate.Year AND EndDate.Year specifies the year range.

GROUP BY Year, LevelCode makes the query compute the highest performance rating by year and level code.

126
Q
A

Expected:
(A) Date.Year
(B) SalaryAmount
(C) StartDate.Year
(D) Year, LevelCode

Date.Year in the SELECT clause reports the year.

MAX(SalaryAmount) computes the highest salary amount.

Date.Year BETWEEN StartDate.Year AND EndDate.Year specifies the year range.

GROUP BY Year, LevelCode makes the query compute the highest salary amount by year and level code.

127
Q

a database that stores data in main memory, instead of or in addition to storage media.

A

in-memory database

Main memory is much faster than storage media, such as flash memory and disk drives.

  1. Oracle database organizes tables on storage media by row.
  2. Oracle Database In-Memory creates a copy of a table in memory.
  3. In-memory data is organized by column to optimize for analytic queries.
  4. Copies of tables are synchronized.
128
Q
A
129
Q

a database that is packaged with a programming language.

A

embedded database sometimes called an in-process database

An embedded database and application program execute together in a single software process. Embedded databases are used in single-user applications that require no database administration, such as applications designed for mobile devices.

An embedded database is not the same as embedded SQL. Embedded database is a database architecture, while embedded SQL, described elsewhere in this material, is a database programming technique.

  1. Students take and edit notes on laptop computers.
  2. A Java application stores notes in SQLite on the laptop.
  3. Students can upload notes to a server and share with others.
  4. Thousands of students may access the server concurrently. Oracle Database scales better than SQLite and is installed on the server.
  5. The application has a two-tier architecture.
130
Q
A
131
Q

a collection of two or more participating databases underneath a coordinating software layer.

A

federated database

The participating databases are autonomous and heterogeneous:

132
Q

An _____ operates independently of other participating databases.

A

autonomous database

An autonomous database is administered and can be queried as if the database were not part of a federated database.

133
Q

_____ either run under different database systems or have incompatible schema.

A

Heterogeneous databases

Databases with incompatible schema might have inconsistent primary and foreign keys keys, similar tables with different designs, or similar columns with different names and data types

134
Q

The coordinating software layer (of federated databases) is called _____, since the software lies between application programs and database software.

A

middleware

Although product capabilities vary, most middleware products have the following components:

A global catalog is a directory of participating database objects, such as tables, columns, and indexes.

A global query processor decomposes a federated query into queries for each participating database.

A database wrapper converts the decomposed queries to the appropriate syntax for each participating database.

135
Q
A
136
Q

an analytic database of raw, unprocessed data copied from multiple data sources.

A

data lake

Data lakes share some characteristics of data warehouses and some characteristics of federated databases:

Like a data warehouse, a data lake is a separate database designed for analytic queries and consisting of data extracted from multiple source systems.

Like a federated database, data in a data lake is not cleansed, integrated, or restructured. Data is stored in the original format and structure. Depending on the data source, data may be loaded continuously rather than at fixed intervals.

137
Q
A
138
Q
A

Expected:

(a) Embedded database
(b) Data warehouse
(c) Federated database
(d) Data lake

(a) The configuration and preference data is local to the user’s computer and temporary, so can easily be handled with an embedded database.
(b) Material costs and hours of labor are usually highly structured and suitable for fact tables, so a data warehouse is a simple solution.
(c) The organization wants to make three databases appear as one. A federated database is a simple, low-cost solution.
(d) Social media data is usually collected rapidly and at high volume. Phrases on protest signs are usually unstructured. Often, a data lake supported by data scientists is appropriate for analysis of high-volume, unstructured data.