2. Disk and Files Flashcards

1
Q

How databases use memory and disks? Why?

A

Whenever a database processes data, that data must exist in memory. Accessing this data is relatively fast, but once the data becomes very large, it becomes impossible to fit all of it within memory. Disks are used to cheaply store all of a database’s data, but they incur a large cost whenever data is accessed or new data is written.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Basic units of data for databases and disks

A

The basic unit of data for relational databases is a record (row). These records are organized into relations (tables) and can be modified, deleted, searched, or created in memory.

The basic unit of data for disk is a page, the smallest unit of transfer from disk to memory and vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How relational databases are represented on a disk?

A

In order to represent relational databases in a format compatible with disk, each relation is stored in its own file and its records are organized into pages in the file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How relations are organized on disk?

4 main decisions to make?

Based on what are they made?

A

Based on the relation’s schema and access pattern, the database will determine:

(1) type of file used, (2) how pages are organized in the file, (3) how records are organized on each page, (4) and how each record is formatted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

2 main types of database files

A

Heap Files and Sorted Files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s meant by “one I/O” in databases?

A

1 I/O is equivalent to 1 page read from disk or 1 page write to disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a heap file?

A

A heap file is a file type with no particular ordering of pages or of the records on pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 main implementations of a heap file

A

Linked List

Page Directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Linked List Heap File - describe the structure

A

In the linked list implementation, each data page contains records, a free space tracker, and pointers (byte offsets) to the next and previous page. There is one header page that acts as the start of the file and separates the data pages into full pages and free pages.

When free data pages become full, they are moved from the free space portion to the front of the full pages portion of the linked list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Page Directory Heap File - describe the structure

A

Linked list for header pages.

Each header page contains a pointer (byte offset) to the next header page, and its entries contain both a pointer to a data page and the amount of free space left within that data page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The main advantage of page directories over linked lists for heap files?

A

Fast inserts.

Needs to read only header pages to find a page with free space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Heap files vs. sorted files - the main trade-off

A

Heap files - fast insert

Sorted files - fast search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a sorted file?

A

A sorted file is a file type where pages are ordered and records within each page are sorted by key(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How sorted files are commonly implemented?

A

These files are implemented using Page Directories and enforce an ordering upon data pages based on how records are sorted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The average cost of insert and search for sorted files

A

Searching through sorted files takes logN I/Os where N = # of pages since binary search can be used to find the page containing the record. Meanwhile, insertion, in the average case, takes logN + N I/Os since binary search is needed to find the page to write to and that inserted record could potentially cause all later records to be pushed back by one. On average, N / 2 pages will need to be pushed back, and this involves a read and a write IO for each of those pages, which results in the N I/Os term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

2 main record types

A

Fixed Length Records (FLR) and Variable Length Records (VLR)

FLRs only contain fixed length fields (integer, boolean, date, etc.)

VLRs contain both fixed length and variable length fields (eg. varchar),

17
Q

How Variable Length Records store their fields?

A

VLRs store all fixed length fields before variable length fields and use a record header that contains pointers to the end of the variable length fields.

18
Q

Record ID - purpose, components

A

every record can be uniquely identified by its record id -

[page #, record # on page].

19
Q

Packed pages with Fixed Length Records - describe

A

Pages containing FLRs always use page headers to store the number of records currently on the page.

If the page is packed, there are no gaps between records. This makes insertion easy as we can calculate the next available position within the page using the # of existing records and the length of each record. Once this value is calculated, we insert the record at the computed offset. Deletion is slightly trickier as it requires moving all records after the deleted record towards the top of the page by one position to keep the page packed.

20
Q

Unpacked pages with Fixed Length Records - describe

A

Pages containing FLRs always use page headers to store the number of records currently on the page.

If the page is unpacked, the page header typically stores an additional bitmap that breaks the page into slots and tracks which slots are open or taken. Using the bitmap, insertion involves finding the first open bit, setting the new record in the corresponding slot, and then setting the bit for that slot. With deletion, we clear the deleted record’s corresponding bit so that future inserts can overwrite that slot.

21
Q

Pages with Variable Length Records - describe

A

Each page uses a page footer that maintains a slot directory tracking slot count, a free space pointer, and entries. The footer starts from the bottom of the page rather than the top so that the slot directory has room to grow when records are inserted.

Each entry in the slot directory consists of a [record pointer, record length] pair.

insert - at the free space pointer, a new [pointer, length] pair is set in any available null entry (or a new one if no nulls)

delete - set both the record pointer and record length to null.

Re-grouping/packing depends on the implementation.

22
Q

Layered organization of DBMS - describe all layers

A