CIS275 - Chapter 5: Data Storage Flashcards

1
Q

the time required to access the first byte in a read or write operation.

A

Access time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

the speed at which data is read or written, following initial access.

A

Transfer rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

memory that is lost when disconnected from power.

A

Volatile memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

memory that is retained without power.

A

Non-volatile memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the primary memory used when computer programs execute.

A

Main memory,

also called random-access memory (RAM),

Main memory is fast, expensive, and has limited capacity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

_____ is less expensive and higher capacity than main memory.

A

Flash memory,

also called solid-state drive (SSD)

Writes to flash memory are much slower than reads, and both are much slower than main memory writes and reads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Memory used to store large amounts of data.

A

Magnetic disk,

also called hard-disk drive (HDD)

Magnetic disk is slower, less expensive, and higher capacity than flash memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Magnetic disk groups data in _____.

A

sectors

traditionally 512 bytes per sector but 4 kilobytes with newer disk formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Flash memory groups data in _____

A

pages

usually between 2 kilobytes and 16 kilobytes per page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Databases and file systems use a uniform size, called a _____, when transferring data between main memory and storage media.

A

block

Block size is independent of storage media.

Database systems typically support block sizes ranging from 2 kilobytes to 64 kilobytes. Smaller block sizes are usually better for transactional applications, which access a few rows per query. Larger block sizes are usually better for analytic applications, which access many rows per query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To minimize block transfers, relational databases usually store an entire row within one block, which is called _____.

A

row-oriented storage

Row-oriented storage performs best when row size is small relative to block size, for two reasons:

Improved query performance. When row size is small relative to block size, each block contains many rows. Queries that read and write multiple rows transfer fewer blocks, resulting in better performance.

Less wasted storage. Row-oriented storage wastes a few bytes per block, since rows do not usually fit evenly into the available space. The wasted space is less than the row size. If row size is small relative to block size, this wasted space is insignificant.

  1. The row size is small relative to the block size, so many rows are transferred per block.
  2. Only 16 bytes go unused, so wasted space is insignificant.
  3. The row size is large relative to block size, so fewer rows are transferred per block.
  4. One kilobyte goes unused, so wasted space is significant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
  1. The large column allows only two rows to be transferred per block and wastes significant space.
  2. Large columns are replaced by a link and are stored in a separate area.
  3. More rows fit per block, and less space is wasted. Queries that do not access the large column are faster.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In column-oriented storage, also called columnar storage, each block stores values for a single column only.

A

column-oriented storage,

also called columnar storage

Column-oriented storage benefits analytic applications in several ways:

Faster data access. More column values are transferred per block, reducing time to access storage media.

Better data compression. Databases often apply data compression algorithms when storing data. Data compression is usually more effective when all values have the same data type. As a result, more values are stored per block, which reduces storage and access time.

  1. With column-oriented storage, a 4 kilobyte block contains roughly 500 8-byte column values.
  2. Computing the average of all incomes reads fewer blocks than row-oriented storage.
  3. Different columns occupy separate blocks.
  4. Selecting multiple columns reads multiple blocks and is slower than row-oriented storage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

a scheme for organizing rows in blocks on storage media.

A

table structure

Databases commonly support four alternative table structures:

Heap table

Sorted table

Hash table

Table cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In a _____, no order is imposed on rows.

A

heap table

  1. In a heap table, the database maintains a pointer to free space, which indicates the location of the next insert.
  2. When a row is inserted, the pointer moves to the next available space.
  3. When a row is deleted, the pointer moves to the deleted space. The deleted space is linked to free space at the end of the table.
  4. As more rows are deleted, free space is linked together in a list.
  5. Inserts go to the first available space in the list.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In a _____, the database designer identifies a _____ that determines physical row order.

A

sorted table

sort column

  1. In a sorted table, rows are sorted on a sort column.
  2. Inserting a new row requires moving all subsequent rows, which is inefficient.
  3. To avoid inefficient inserts, the sort column order is maintained with links, producing a linked list.
  4. The sort order is maintained during an insert by changing two links rather than moving rows.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In a _____, rows are assigned to buckets.

A

hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A _____ is a block or group of blocks containing rows.

A

bucket

Initially, each bucket has one block. As a table grows, some buckets eventually fill up with rows, and the database allocates additional blocks. New blocks are linked to the initial block, and the bucket becomes a chain of linked blocks.

The bucket containing each row is determined by a hash function and a hash key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The _____ is a column or group of columns, usually the primary key.

A

hash key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The _____ computes the bucket containing the row from the hash key.

A

hash function

Hash functions are designed to scramble row locations and evenly distribute rows across blocks. The modulo function is a simple hash function with four steps:

Convert the hash key by interpreting the key’s bits as an integer value.

Divide the integer by the number of buckets.

Interpret the division remainder as the bucket number.

Convert the bucket number to the physical address of the block containing the row.

  1. FlightNumber is the hash key for the Flight table.
  2. The hash function is modulo 10, resulting in 10 buckets. Rows are assigned to buckets based on the last digit of the hash key.
  3. If a bucket fills with rows, additional blocks are allocated and linked to the first block.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A _____ automatically allocates more blocks to the table, creates additional buckets, and distributes rows across all buckets.

A

dynamic hash function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q
A

Expected:

free space A

free space B → free space A

In a heap table, the database maintains a pointer to the first free space in the table. In the above table, free space A is the only free space, so the pointer points to free space A.

Row 659 is deleted, so the free space pointer moves to free space B. Free space B is linked to free space A, the next free space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q
A

Expected:

free space B,

free space A,

free space A,

new block

When row 1961 is deleted, the free space pointer moves to free space B. Free space B is linked to free space A, the next free space.

Because the free space pointer now points to free space B, the first insert is stored in free space B. Then, the free space pointer is reset to point to free space A, so the second insert is stored in free space A. There is still room for another row in free space A, so the third insert is also stored in free space A. Since the block is now full, the free space pointer is set to point to the beginning of the new block, where the fourth insert is stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q
A

Expected:

23 → 24, 24 → 27

No sort columns

A pointer links a row to the row with the next higher key. Since 23 < 24 < 27, pointers 23 → 24 and 24 → 27 are added to table A.

No columns in table B are sorted, so table B has no sort columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q
A

Expected:

14 → 20, 20 → 21

Name, State, (Name, State)

A pointer links a row to the row with the next higher key. Since 14 < 20 < 21, pointers 14 → 20 and 20 → 21 are added to table A.

In table B, the Name and State columns are sorted, so either column is a valid sort column.

The sort column can be a group of columns. The Name and State columns are sorted, so the Name and State columns together constitute a valid sort column, regardless of which column contains the primary key values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q
A

Expected:

22 → 24, 24 → 25

No sort columns

A pointer links a row to the row with the next higher key. Since 22 < 24 < 25, pointers 22 → 24 and 24 → 25 are added to table A.

No columns in table B are sorted, so table B has no sort columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Consider a hash table with 5 buckets and a modulo function. Which bucket does each hash key map to?

A

Expected:

0, 3, 1, 4, 4

5 buckets exist, so the hash function is modulo 5.

755 modulo 5 is 0, so bucket 0.

1098 modulo 5 is 3, so bucket 3.

446 modulo 5 is 1, so bucket 1.

1204 modulo 5 is 4, so bucket 4.

1499 modulo 5 is 4, so bucket 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Consider a hash table with 5 buckets and a modulo function. Which bucket does each hash key map to?

A

Expected:

0, 2, 1, 4, 2

5 buckets exist, so the hash function is modulo 5.

1425 modulo 5 is 0, so bucket 0.

302 modulo 5 is 2, so bucket 2.

1321 modulo 5 is 1, so bucket 1.

914 modulo 5 is 4, so bucket 4.

177 modulo 5 is 2, so bucket 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Consider a hash table with 5 buckets and a modulo function. Which bucket does each hash key map to?

A

Expected: 4, 2, 0, 3, 1

5 buckets exist, so the hash function is modulo 5.

184 modulo 5 is 4, so bucket 4.

582 modulo 5 is 2, so bucket 2.

355 modulo 5 is 0, so bucket 0.

433 modulo 5 is 3, so bucket 3.

256 modulo 5 is 1, so bucket 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q
A

Expected: 231 blocks, 640 rows

(5,600 bytes/block) / (70 bytes/row) = 80 rows/block

So 18,465 rows would need:

18,465 rows / (80 rows/block) = 230 blocks with remainder 65, so 231 blocks.

Each bucket has 1 block, and 1 block has 80 rows. So 8 buckets can have up to 80 × 8 = 640 rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q
A

Expected: 421 blocks, 630 rows

(6,300 bytes/block) / (70 bytes/row) = 90 rows/block

So 37,825 rows would need:

37,825 rows / (90 rows/block) = 420 blocks with remainder 25, so 421 blocks.

Each bucket has 1 block, and 1 block has 90 rows. So 7 buckets can have up to 90 × 7 = 630 rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

_____, interleave rows of two or more tables in the same storage area.

A

Table clusters, also called multi-tables

Table clusters are optimal when joining interleaved tables on the cluster key, since physical row location is the same as output order. Table clusters perform poorly for many other queries:

Join on columns other than cluster key. In a join on a column that is not the cluster key, physical row location is not the same as output order, so the join is slow.

Read multiple rows of a single table. Table clusters spread each table across more blocks than other structures. Queries that read multiple rows may access more blocks.

Update clustering key. Rows may move to different blocks when the clustering key changes.

Table clusters are not optimal for many queries and therefore are not commonly used.

  1. Airline and Flight tables both contain an AirlineName column.
  2. AirlineName is the cluster key for the table cluster.
  3. Rows of Airline and Flight are interleaved on storage media.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Table clusters have a _____, a column that is available in all interleaved tables.

A

cluster key

The cluster key determines the order in which rows are interleaved. Rows with the same cluster key value are stored together. Usually the cluster key is the primary key of one table and the corresponding foreign key of another, as in the animation below.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

a file containing column values, along with pointers to rows containing the column value.

A

single-level index

  1. An index is on the DepartureAirport column of the Flight table.
  2. Each column value appears in the index in sorted order.
  3. Each column value has a pointer to the row containing the value.
  4. If the column values are not unique, the index may have multiple entries for some values.
  5. Alternatively, an index on a non-unique column may have multiple pointers for some values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

In a _____, each index entry is a composite of values from all indexed columns.

A

multi-column index

In all other respects, multi-column indexes behave exactly like indexes on a single column.

  1. FlightNumber is the Flight table’s primary key. FlightNumber is sorted but is not indexed.
  2. The SELECT statement has no WHERE clause, so the hit ratio is high. The database performs a table scan and reads all table blocks.
  3. The WHERE clause does not contain an indexed column, so hit ratio is high. The database must perform a table scan.
  4. Only the first two blocks are read because FlightNumber 803 is found in the second table block.
  5. Since the hit ratio is low and the WHERE clause contains an indexed column, the database performs an index scan.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

a database operation that reads table blocks directly, without accessing an index.

A

table scan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

a database operation that reads index blocks sequentially, in order to locate the needed table blocks.

A

index scan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

the percentage of table rows selected by a query.

A

Hit ratio, also called filter factor or selectivity

When a SELECT query is executed, the database examines the WHERE clause and estimates hit ratio. If hit ratio is high, the database performs a table scan. If hit ratio is low, the query needs only a few table blocks, so a table scan would be inefficient. Instead, the database:

Looks for an indexed column in the WHERE clause.

Scans the index.

Finds values that match the WHERE clause.

Reads the corresponding table blocks.

If the WHERE clause does not contain an indexed column, the database must perform a table scan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

In a _____, the database repeatedly splits the index in two until it finds the entry containing the search value:

A

binary search

  1. The database first compares the search value to an entry in the middle of the index.
  2. If the search value is less than the entry value, the search value is in the first half of the index. If not, the search value is in the second half.
  3. The database now compares the search value to the entry in the middle of the selected half, to narrow the search to one quarter of the index.
  4. The database continues in this manner until it finds the index block containing the search value.
  5. The DepartureAirport index occupies nine blocks. The SELECT query looks for the row containing DepartureAirport = ‘DUB’.
  6. A binary search first examines the middle, block 4, of the sorted index.
  7. Since ‘DUB’ precedes ‘ORD’, the binary search resets last to block 3 and middle to block 1.
  8. Since ‘DUB’ follows ‘BWI’, search resets first and middle to block 2.
  9. The binary search reads block 2 and finds ‘DUB’. The database follows DUB’s pointer to the table block containing the row.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

an index on a unique sort column.

A

primary index

52
Q

an index on a non-unique sort column.

A

clustering index

53
Q

an index that is not on the sort column.

A

secondary index

54
Q

A _____ contains an entry for every table row.

A

dense index

55
Q

A _____ contains an entry for every table block.

A

sparse index

The table is not sorted on DepartureAirport column. The index on DepartureAirport must be dense.

The table is sorted on DepartureAirport column.

The index on the sort column may be sparse.

‘LAX’ does not appear in the index, but the block containing ‘LAX’ can be determined from sort order.

56
Q
A
57
Q
A
  1. A new row is inserted into the table with a dense sorted index. A new index entry ‘PIT’ must be inserted in sort order.
  2. Since the index block is full, the block splits. Half of the existing entries are moved to the new block.
  3. The block split creates space for the new index entry.
58
Q
A
59
Q
A

Expected:

(k)

Index entry (k) contains the value SEA with a pointer to the row that refers to Delta Airlines flight 1222.

60
Q
A

Expected:

between the HKG and LAX entries

Since the single-level index is sorted, an entry for JFK goes between the HKG and LAX entries. The database must create space for the new entry.

61
Q
A

Expected:

between the LAX and ORD entries

Since the single-level index is sorted, an entry for NRT goes between the LAX and ORD entries. The database must create space for the new entry.

62
Q
A

Expected:

251, 9000

The database reads at most 250 index blocks, plus one table block containing the row for flight 1695. With no index, the database performs a table scan and reads at most all 9000 table blocks.

63
Q
A

Expected:

201, 5000

The database reads at most 200 index blocks, plus one table block containing the row for flight 1062. With no index, the database performs a table scan and reads at most all 5000 table blocks.

64
Q
A

Expected:

225 seconds, 4500000 blocks

(900,000,000 rows) × (500 bytes/row) × (1 gigabyte/1,000,000,000 bytes) / (2 gigabytes/sec) = 225 seconds

Each index block contains 8 kb × (1000 bytes/kb) / 40 bytes/entry = 200 entries. The index has (900,000,000 entries) / (200 entries/block) = 4,500,000 blocks. A binary search requires log(2)4,500,000 blocks.

65
Q
A

Expected:

640 seconds, 20000000 blocks

(800,000,000 rows) × (400 bytes/row) × (1 gigabyte/1,000,000,000 bytes) / (0.5 gigabytes/sec) = 640 seconds

Each index block contains 4 kb × (1000 bytes/kb) / 100 bytes/entry = 40 entries. The index has (800,000,000 entries) / (40 entries/block) = 20,000,000 blocks. A binary search requires log(2) 20,000,000 blocks.

66
Q
A

Expected: 2000 seconds, 5000000 blocks

(400,000,000 rows) × (500 bytes/row) × (1 gigabyte/1,000,000,000 bytes) / (0.1 gigabytes/sec) = 2000 seconds

Each index block contains 8 kb × (1000 bytes/kb) / 100 bytes/entry = 80 entries. The index has (400,000,000 entries) / (80 entries/block) = 5,000,000 blocks. A binary search requires log(2) 5,000,000 blocks.

67
Q
A

Expected: 300 seconds, 375000 blocks

(600,000,000 rows) × (500 bytes/row) × (1 gigabyte/1,000,000,000 bytes) / (1 gigabyte/sec) = 300 seconds

Each index block contains 16 kb × (1000 bytes/kb) / 10 bytes/entry = 1600 entries. The index has (600,000,000 entries) / (1600 entries/block) = 375,000 blocks. A binary search requires log(2) 375,000 blocks.

68
Q

A _____ stores column values and row pointers in a hierarchy.

A

multi-level index

The bottom level of the hierarchy is a sorted single-level index. The bottom level is sparse for primary and clustering indexes, or dense for secondary indexes.

  1. Flight table is sorted on non-unique column DepartureAirport.
  2. The bottom level is a sparse index on DepartureAirport. If the table is not sorted on the index column, the bottom level must be dense.
  3. The next higher level is a sparse index to the lower level.
69
Q
A
70
Q

The number of index entries per block is called the _____ of a multi-level index.

A

fan-out

The number of levels in a multi-level index can be computed from fan-out, number of rows, and rows per block:

For a dense index, number of levels = logfan-out (number of rows)

For a sparse index, number of levels = logfan-out (number of rows / rows per block)

71
Q
A
72
Q
A
  1. A query searches for rows containing ‘FRA’.
  2. The database reads the index’s top level. ‘FRA’ is between ‘ATL’ and ‘MIN’.
  3. The database follows the ‘ATL’ pointer and reads the bottom-level block. ‘FRA’ is between ‘FLL’ and ‘GRB’.
  4. The database follows the ‘FLL’ pointer, reads the table block, and locates the row containing ‘FRA’.
73
Q
A
74
Q

Each path from the top-level block to a bottom-level block is called a _____.

A

branch

75
Q

Multi-level indexes are called _____ when all branches are the same length.

A

balanced

Imbalanced indexes are undesirable, since processing time is unpredictable. If a query follows a long branch, the query is relatively slow. For this reason, inserts are managed to maintain balanced indexes:

  1. In a dense index, inserts always generate new bottom-level index entries. In a sparse index, inserts generate new bottom-level index entries when table blocks split.
  2. If the new index entry goes in a full index block, the block splits. Half of the rows move to the new block, creating space for the entry.
  3. The new block in the bottom level generates a new index entry the next level up. If block in the next level up is full, the block splits and the process repeats.
  4. If blocks are full at all index levels, the split propagates to the top level. In this case, the top-level block splits and a new level is created.
  5. The Flight table is sorted on DepartureAirport. A row is inserted with DepartureAirport ‘FSK’.
  6. The table block is full and splits to create space for the new row.
  7. The new table block generates a new entry in the index’s bottom level.
  8. The bottom-level index block is full and splits.
  9. The new bottom-level block generates a new entry for the index’s top level.
  10. The top-level index is full and splits. A new top level is created containing entries for two blocks in the middle level.
76
Q

Multi-level indexes are called _____ when branches are different lengths.

A

imbalanced

Imbalanced indexes are undesirable, since processing time is unpredictable. If a query follows a long branch, the query is relatively slow. For this reason, inserts are managed to maintain balanced indexes:

  1. In a dense index, inserts always generate new bottom-level index entries. In a sparse index, inserts generate new bottom-level index entries when table blocks split.
  2. If the new index entry goes in a full index block, the block splits. Half of the rows move to the new block, creating space for the entry.
  3. The new block in the bottom level generates a new index entry the next level up. If block in the next level up is full, the block splits and the process repeats.
  4. If blocks are full at all index levels, the split propagates to the top level. In this case, the top-level block splits and a new level is created.
77
Q
A
78
Q

All indexed values appear in the bottom level. Pointers to table blocks appear only in the bottom level. Since some indexed values also appear in higher levels, values are occasionally repeated in the index.

A

B+tree

The B+tree structure has two benefits:

  1. The bottom level of a B+tree is a single-level index and can be scanned or searched.
  2. In a B-tree, inserts, updates, and deletes may cause a table pointer to change levels, which is hard to implement. B+trees do not have this problem, since table pointers are always in the bottom level.

Because of the advantages above, multi-level indexes are usually implemented as B+trees.

79
Q

If an indexed value appears in a higher level, the value is not repeated at lower levels. Instead, a pointer to the corresponding table block appears in the higher level along with the value.

A

B-tree

In a B-tree, inserts, updates, and deletes may cause a table pointer to change levels, which is hard to implement. B+trees do not have this problem, since table pointers are always in the bottom level.

  1. The bottom level of a B-tree is like the bottom level of a B+tree.
  2. Higher levels of a B-tree alternate index block pointers and table block pointers.
  3. Entries that appear in higher levels are not repeated in the bottom level.
80
Q
A
81
Q

In a _____, index entries are assigned to buckets.

A

hash index

A hash index is similar to a hash table, described in another section. However, a hash index stores index entries in each bucket, while a hash table stores table rows in each bucket.

Sometimes the term hash index is used to mean a hash key, but the two terms are different. A hash index is an index that is structured using a hash function. A hash key is a column that determines the physical location of rows in a hash table.

  1. The table is clustered on DepartureAirportCode.
  2. The hash index is on FlightNumber. The hash function is modulo 10.
  3. Index entries for flight numbers ending in 0 go in bucket 0.
  4. Index entries for flight numbers ending in 1 go in bucket 1.
  5. Since the hash function is modulo 10, the hash index has 10 buckets.
82
Q

A _____ is a block or group of blocks containing index entries.

A

bucket

Initially, each bucket has one block.

As an index grows, some buckets eventually fill up, and additional blocks are allocated and linked to the initial block.

83
Q

The bucket containing each index entry is determined by a _____, which computes a bucket number from the value of the indexed column.

A

hash function

To locate a row containing a column value, the database:

  1. Applies the hash function to the column value to compute a bucket number.
  2. Reads the index blocks for the bucket number.
  3. Finds the index entry for the column value and reads the table block pointer.
  4. Reads the table block containing the row.
84
Q
A
85
Q

A _____ is a grid of bits:

A

bitmap index

  1. Each index row corresponds to a unique table row. If the table’s primary key is an integer, the index row number might be the primary key value. Alternatively, the index row number might be an internal table row number, maintained by the database.
  2. Each index column corresponds to a distinct table value. Ex: If the index is on AirportCode, each index column corresponds to a different three-letter airport code. The mapping of index column numbers to table values is computed with a function or stored in an internal ‘lookup’ table.
  3. The database maintains row numbers internally.
  4. A bitmap index is a grid of bits. Bitmap index rows correspond to internal table row numbers.
  5. Bitmap index columns correspond to table values.
  6. Table row 0 contains ATL, so the corresponding index bit is 1.
  7. Table row 1 contains DUB, so the corresponding index bit is 1.
  8. Each 1 in the bitmap index indicates the corresponding value appears in the table row.
  9. Each 0 in the bitmap index indicates the corresponding value does not appear in the table row.
86
Q
A
87
Q

A single- or multi-level index normally contains pointers to table blocks and is called a _____.

A

physical index

88
Q

A _____ is a single- or multi-level index in which pointers to table blocks are replaced with primary key values.

A

logical index

Logical indexes are always secondary indexes and require a separate primary index on the same table.

To locate a row containing a column value, the database:

  1. Looks up the column value in the logical index to find the primary key value.
  2. Looks up the primary key value in the primary index to find the table block pointer.
  3. Reads the table block containing the row.
  4. The table is sorted on FlightNumber, the primary key. This example assumes FlightNumber is unique across all airlines.
  5. The table has primary index on FlightNumber. The primary index is physical and sparse.
  6. The table has secondary index on DepartureAirportCode. The secondary index is logical and dense.
  7. To find the row containing EUG, the database finds the primary key 200 in the secondary index, then finds the block pointer in the primary index leading to the block containing 200.
89
Q
A
90
Q

In a _____, the database designer specifies a function on the column value.

A

function index

Index entries contain the result of the function applied to column values, rather than the column values.

Ex: Column values are stored as decimal numbers between 0 and 1, but users specify percentages as integers between 0 and 100 in queries. The database designer specifies a function index that multiplies column values by 100 and converts the result to an integer. The index contains integers between 0 and 100, so the database can use the function index to process queries.

  1. The table contains times in AM/PM format.
  2. Queries specify time in 24-hour format.
  3. The function index converts AM/PM format to 24-hour format.
  4. The function index format is consistent with the WHERE clause, so the database can use the index.
91
Q
A
92
Q
A

Expected:

Flight numbers 735, 905, 255, 885, and 995 The hash function is modulo 10, so index entries for flight numbers ending in 5 go in bucket 5. 735, 905, 255, 885, and 995 are the flight numbers that end in 5.

93
Q
A

5, 8

94
Q
A

1, 3, 11

95
Q
A
96
Q
A

6:00 and 7:35

97
Q

a database object that maps one or more tables to a single file.

A

tablespace

The CREATE TABLESPACE statement names a tablespace and assigns the tablespace to a file. The CREATE TABLE statement assigns a table to a tablespace. Indexes are stored in the same tablespace as the indexed table.

98
Q
A
  1. CodeTablespace contains small, read-only code tables and is stored on a flash drive for fast reads.
  2. The State table is assigned to CodeTablespace.
  3. Additional tables are assigned to the same tablespace to reduce overhead.
99
Q
A
100
Q

a subset of table data.

A

partition

One table has many partitions that do not overlap and, together, contain all table data.

101
Q

a subset of table rows.

A

horizontal partition

MySQL and most relational databases partition tables horizontally, not vertically.

102
Q

a subset of table columns.

A

vertical partition

MySQL and most relational databases partition tables horizontally, not vertically.

103
Q
A
  1. The Employee table is partitioned on hire date.
  2. Each partition corresponds to one year.
  3. Each partition is stored in a separate tablespace.
104
Q
A
105
Q

To partition a table, the database administrator specifies a _____ based on one or more partition columns.

A

partition expression

The partition expression may be simple, such as the value of a single partition column, or a complex expression based on several partition columns.

106
Q

A _____ associates each partition with a range of partition expression values.

A

range partition

The VALUES LESS THAN keywords specify the upper bound of each range. The MAXVALUE keyword represents the highest column value, and VALUES LESS THAN MAXVALUE specifies the highest range. Each partition is explicitly named by the database administrator.

  1. Order table has columns for order number, date, product code, and amount of order.
  2. Range partition assigns partitions based on OrderDate year.
  3. Rows are assigned to one of the three partitions based on year.
107
Q

A _____ associates each partition with an explicit list of partition expression values using the VALUES IN keywords.

A

list partition

Like a range partition, each partition is explicitly named.

  1. Order table has columns for order number, date, product code, and amount of order.
  2. List partition assigns partitions based on specific ProductCodes.
  3. Each partition contains rows with specific product codes.
108
Q

A _____ requires a partition expression with positive integer values.

A

hash partition

The database administrator specifies the number of partitions, N, and partitions are automatically named p0 through p(N-1). The partition number for each row is computed as: (partition expression value) modulo N.

109
Q

A _____ is similar to a hash partition, except the partition expression is determined automatically by the database.

A

key partition

  1. Order table has columns for order number, date, product code, and amount of order.
  2. Range partition assigns partitions based on OrderDate year.
  3. Rows are assigned to one of the three partitions based on year.
  4. List partition assigns partitions based on specific ProductCodes.
  5. Each partition contains rows with specific product codes.
  6. Hash partition expression is HASH(OrderNumber) with ten partitions.
  7. Each row is assigned to the partition OrderNumber modulo 10, so assigned partitions are based on the order number’s last digit.
110
Q
A
111
Q

_____ specifies tables, columns, and keys.

A

Logical design

  1. Logical design specifies tables and columns.
  2. Logical design also specifies primary and foreign keys.
112
Q

______ specifies indexes, table structures, and partitions.

A

Physical design

Physical design affects query performance but never affects query results.

  1. Physical design specifies table structure. Flight table is sorted on DepartureAirportCode.
  2. Physical design also specifies indexes and index types.
113
Q

A _____ translates instructions generated by a query processor into low-level commands that access data on storage media.

A

storage engine or storage manager

Storage engines support different index and table structures, so physical design is dependent on a specific storage engine.

114
Q
A
115
Q

The _____ statement creates an index by specifying the index name and table columns that compose the index.

A

CREATE INDEX

Most indexes specify just one column, but a composite index specifies multiple columns.

116
Q

The ______ statement deletes a table’s index.

A

DROP INDEX

117
Q

The _____ statement displays a table’s index.

A

SHOW INDEX

SHOW INDEX generates a result table with one row for each column of each index. A multi-column index has multiple rows in the result table.

118
Q
  1. The Address table has primary key AddressID and foreign key CityID.
A
  1. The Address table has primary key AddressID and foreign key CityID.
  2. The CREATE INDEX statement creates a secondary index on PostalCode.
  3. The SHOW INDEX statement displays all indexes on the Address table. Some columns of the result table have been omitted for clarity.
  4. MySQL created indexes on the primary and foreign keys automatically when Address was created. AddressID has 603 distinct index values, and CityID has 99 distinct index values.
  5. PostalCodeIndex was created with the CREATE INDEX statement. PostalCodeIndex is an index on a non-unique column with 597 distinct values.
119
Q
A
120
Q

The _____ generates a result table that describes how a statement is executed by the storage engine.

A

EXPLAIN statement

EXPLAIN syntax is simple and uniform in most databases: EXPLAIN statement; The statement can be any SELECT, INSERT, UPDATE, or DELETE statement.

Although EXPLAIN syntax is standard, the result table varies significantly by database. In MySQL with InnoDB, the result table has one row for each table in the statement. If the statement contains multiple queries, such as a main query and a subquery, the result table has one row for each table in each query.

121
Q
A
  1. Address table has 603 rows. City table has 57 rows.
  2. The SELECT statement joins Address and City tables.
  3. EXPLAIN generates a results table with one row for each table in the SELECT statement.
  4. The Address table is accessed via table scan. Of 603 rows read, 33% are selected.
  5. The City table is accessed via primary index. One City row is read and selected for each Address row.
122
Q
A
123
Q

Physical design process

A database administrator may take a simple approach to physical design for MySQL with InnoDB:

A
  1. Create initial physical design. Create a primary index on primary keys and a secondary index on foreign keys. In MySQL with InnoDB, these indexes are created automatically for all tables. In other databases, this step is necessary for tables larger than roughly 100 kilobytes, but can be omitted for smaller tables.
  2. Identify slow queries. The MySQL slow query log is a file that records all long-running queries submitted to the database. Identify slow queries by inspecting the log. Most other relational databases have similar query logs.
  3. EXPLAIN slow queries. Run EXPLAIN on each slow query to assess the effectiveness of indexes. A high value for rows and a low value for filtered indicates either a table scan or an ineffective index.
  4. Create and drop indexes based on the EXPLAIN result table. Consider creating an index when the rows value is high and the filtered value is low. Consider dropping indexes that are never used.
  5. Partition large tables. If some queries are still slow after indexes are created, consider partitions. Partition when slow queries access a small subset of rows of a large table. The partition column should appear in the WHERE clause of slow queries. Often, a range partition is best.
124
Q
A
  1. The SELECT statement appears in the MySQL slow query log.
  2. The database administrator runs EXPLAIN on the SELECT query.
  3. The EXPLAIN result shows no index was used, many rows were read, and few rows were selected.
  4. To speed up the SELECT query, the database administrator creates an index on the CityName column.
  5. Running EXPLAIN again shows the query uses CityNameIndex, reads only one row, and executes faster.
125
Q
A