Week 6 Flashcards

1
Q

Where do most DB data resides on?

A

Secondary storage (HDD/SSD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do most DB data resides on secondary storage?

A
  1. It is may be too large to reside entirely in main memory (e.g. H-Unique core datasets > 250GB and will grow to several TB)
  2. Secondary storage costs orders of magnitude lower than main memory, but slower to access
  3. Often not all database data required frequently - access from disk as required
  4. Secondary storage offers persistence
  5. Trade off between storage capacity, robustness and speed of access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Do main memory DB exist?

A

Yes,
1. Entire database held in main memory (along with OS, DBMS and possibly other applications)
2. Suited for real-time applications requiring extremely fast response times (e.g. Telephone network routing, high-frequency trading)
3. Extremely expensive for large datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How data is organised on disk

A

Data is organised on disk into file of records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Record?

A

Record is a collection of related data items
- e.g. Personnel record may contain forename, surname, DOB, NI number etc
- Each item consists of one or more bytes of data
- Each data item corresponds to a particular field of the record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a record type?

A

A collection of field names and their corresponding data-types

Data types of fields are standard data types such as integer, float, character strings, date etc

Number of bytes required to store data items of each particular type if fixed for a given computer system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a file?

A

A sequence of n records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two types of records?

A
  1. Fixed length records - every record in the file is the exact same size (in bytes)
  2. Variable length records - file contains records of differing lengths
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Consider a personnel record, if forename and surname have a maximum defined length of 15 characters each with a fixed number of bytes, what is the type of the record?

A

Fixed length records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

On a fixed-length record, how does the position of records identified?

A

Starting byte position of each field can be identified relative to the start of the record, similarly, the start position of the next record can be identified relative to position of the end of the record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the reason different records in a file may have different sizes in bytes?

A
  1. Records of the same record type but have one or more varying length fields (e.g. name fields of employee record)
  2. Records of the same type but a field may have multiple values for a record (e.g. multiple contact phone numbers)
  3. Records of same type but one or more fields may be optional
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In variable-length records, what is the purpose of different types clustered together?

A

For performance and retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To prevent a waste of space on disk, what do variable length records use to terminate the end of the record

A

Special separator character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the other methods for encoding variable length fields may include?

A
  1. Fields as name/value pairs <field name, field value>
  2. Field length (in bytes) can be stored preceding field value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Records of a file must be allocated to disk blocks, what is a block?

A

Block is data transfer unit between disk and main memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the three possibilities for block sizing?

A
  1. Block size > record size (block may contain several records)
  2. Block size < record size (Record is stored across multiple blocks)
  3. Block size = record size (Exactly one record per block)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Suppose block size is B bytes and a file contains fixed length records of size R bytes, if B > R, how many blocking factor (bfr) can we fit?

A

[B/R] records per block, [(x)] is a floor function that rounds the number x down to an integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a bfr

A

Blocking factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What would happen if R does not divide B exactly?

A

There will be unused space in each block equal to B-(bfr *R) bytes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If a block does not have enough remaining space to store the complete records, the situation may be handled in one of two ways which is?

A
  1. Spanned records
  2. Unspanned records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Explain the Spanned records

A

Spanned records may be spread across two blocks, the first block has a pointer to the block containing the rest of the record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Explain the Unspanned records

A

In this case records are not allowed to cross block boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why do we need to allocate file blocks on disk?!

A
  1. Database files typically have an initial allocation of blocks on disk
  2. As data growth occurs (and it nearly always does!) files need to be able to grow to accommodate the enlarged data.
  3. Block allocation routines need to balance performance, flexibility and space efficiency
  4. Disk space is shared with any OS and other application files that are using the same disk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the 4 ways of allocating file blocks to disk blocks

A
  1. Continuous allocation
  2. Linked allocation
  3. Cluster allocation
  4. Indexed allocation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the Continuous allocation?

A

File blocks allocated to consecutive disk blocks
+ve: Very fast reading of the whole file as blocks are contiguous
-ve: File expansion difficult due to used block on disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the Linked allocation?

A

Each file block contains a pointer to the next file block
+ve: Very easy to expand the file, just allocate the next free block and add a pointer
-ve: File reads are slower, especially with magnetic disks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the Cluster allocation?

A
  1. Combines continuous and linked allocations
  2. Allocate clusters of blocks and link them with pointers
    - Achieve a balance between ease of expansion and read performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the Indexed allocation?

A
  1. One or more index blocks contain pointer to actual file blocks
  2. Read index blocks to find pointers to blocks containing desired data
  3. Analogous to using a library card index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Explain a File Headers

A
  1. Also known as file descriptor
  2. Contains information needed by programs accessing records in the file

2.1. Information to determine disk address of file block

2.2. Record format description
2.2.1. For both fixed and variable length records
2.2.2. Order of fields
2.2.3. Type codes
2.2.4. Field & record separators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the two groups of operations on a file?

A
  1. Retrieval Operations
    1.1. Do not change data in the file
    1.1.1 Locate records so field values can be read and processed
  2. Update Operations
    2.1. Modify the data file
    2.1.1 Insertion or deletion of records
    2.1.2 Alteration of field values within a record or records

These are the underlying file operations for which SQL SELECT/INSERT/UPDATE/DELETE statements provide an abstraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Explain the Open File Operations

A
  1. Prepares file for reading/writing
  2. Allocates buffers (usually minimum of 2) to hold file blocks from disk
  3. Retrieves file header
  4. Set file pointer to beginning of file
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Explain the Reset File Operation

A
  1. Set file pointer back to beginning of file
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Explain the Find/Locate file operation

A
  1. Searches for first record satisfying search condition
  2. Transfers block containing matched records into main memory buffer (if not already in memory)
  3. Points file pointer to fetched record in file
34
Q

Explain the FindNext file operation

A

Same as find, but gets the NEXT matching record in file

35
Q

Explain the Read/Get file operation

A
  1. Copies current record from buffer to program variable
  2. May also advance file pointer to next record in file
    a. May cause next block to be read from disk
36
Q

Explain the Delete file operation

A
  1. Deletes the current record
  2. Updates file on disk to reflect deletion
37
Q

Explain the Modify file operation

A
  1. Modifies some fields for the current record
  2. Updates file on disk to reflect changes
38
Q

Explain the Insert file operation

A
  1. Locates block where record to be inserted
  2. Transfers block to main memory buffer
  3. Writes record into buffer
  4. Writes buffer to disk
39
Q

Explain the Close file operation

A
  1. Releases all buffers
  2. Performs any other cleanup operations required
40
Q

What are the 3 Set-at-a-time operations

A

*FindAll
* Locate all records that satisfy a search condition

  • FindOrdered
    * Retrieves all records in a file in a specified order
  • Reorganise
    * E.g. Reorder file records based on a specified field value
41
Q

What are the 2 file organisation?

A
  1. Primary file organisation
  2. Secondary file organisation
42
Q

What are the two types of Primary organisation

A
  1. unordered records
  2. ordered records
43
Q

Explain briefly primary organisation: unordered records

A
  1. Also known as heap files
  2. Simples most basic type of organisation
  3. Records placed in file in insertion order (new records placed at end of file)
44
Q

What are the characteristics of primary organisation: unordered records

A
  1. Inserting new record very efficient
  2. Searching for record very expensive
  3. Deletion of record expensive and inefficient
45
Q

Why do inserting in the primary organisation: unordered records very efficient?

A
  1. Address of last block kept in file header
  2. Last file block copied to buffer
  3. New record added and written to disk
46
Q

Why do searching for record in the primary organisation: unordered record very expensive?

A

Linear search required

47
Q

Why do deletion of record in the primary organisation: unordered record expensive and inefficient

A
  1. Linear search to find block with record to be deleted
  2. Copy block to buffer
  3. Delete record from buffer
  4. Rewrite block to disk
  5. Leaves unused space on disk
  6. Large no. of deletes leads to much wasted space
48
Q

What are the alternative deletion approach in the unordered records

A
  1. Use deletion marker
  2. Extra bit set to mark records as deleted
  3. Search operations ignore deleted records
49
Q

Explain briefly Primary organisation: ordered records

A

Physically order records in a file based on one of their fields (called ordering field)
- Ordering field called ordering key if it is a key field of file
– i.e. unique value for each record

50
Q

What are the advantage of Primary organisation: ordered records?

A
  1. Reading records in order defined by key extremely efficient (no sorting required)
  2. Finding next record in order requires no additional block access unless current record is last in block
  3. Using search conditions based on ordering key results in faster access
51
Q

What are the disadvantages of Primary Organisation: ordered records

A
  1. no advantage for acess based on values of non-ordering fields
  2. Insertion and deletion of records very expensive (Records must remain physically ordered)
  3. E.g. Insert:
    - Find correct position in file
    - Make space and insert record
    - Very expensive operation for large files
  4. Modification of a record
    - Record may change position in file > equivalent to deletion + insertion
52
Q

What is a secondary file?

A

It describes supplemental access structures which are used to speed up the retrieval of records when specific search conditions are met.

53
Q

What are the types of secondary files?

A
  1. Single Level
  2. Multi-level
54
Q

What are the 3 type of indexes there are in a single level secondary file?

A
  1. Primary indexes
  2. Clustering indexes
  3. Secondary indexes
55
Q

What are the characteristics of secondary files?

A
  1. Exist in addition to primary file organisations (similar to indexes in textbooks)
  2. Provide an alternative way of locating record blocks on disk without affecting the physical placement of records on disk
  3. Any field of a record in a file can be used to create an index (Multiple indexes on different fields can be constructed for the file)
56
Q

What do you called fields used to create index

A

Indexing fields

57
Q

Why do values in an index are ordered?

A
  1. Allows binary (chop) search to be carried out on the index
  2. Binary search is highly efficient with ordered lists
  3. Once located, index item points to one or more blocks in file where required records are located
58
Q

Explain the Single level indexes

A
  1. Based on ordered files
  2. Usually defined on a single field of a file called an indexing field
  3. Index typically stores (each value of the field, list of pointers to all disk blocks containing records with that field value)
  4. If both data file and index file are ordered (Index file will usually be much smaller than data file, thus searching index is more efficient)
59
Q

How do primary indexes work

A
  1. Specified the ordering key of an ordered file (the ordering key has a unique value for each record)
  2. The ordering key field is used to physically order file records on the disk
  3. Index is an ordered file of fixed-length records (each record has two fields, first field is of same datatype
60
Q

How many index entry in the index file for each block in data file?

A

One index
1. Not one entry for each record in data file
2. index entry contains primary key for the first record in a block along with pointer to that block

61
Q

What is the anchor record of the block or the block anchor

A

First record in each block in the data file

62
Q

PRIMARY INDEX EXAMPLE

A
63
Q

What are the advantages of Primary indexes?

A
  1. Search for location of relevant record blocks very efficient
  2. Index is much smaller than main file with fewer entries
64
Q

What are the disadvantages of Primary indexes?

A
  1. Same as with ordered files
  2. Insertion and deletion of records with some extra complexities
65
Q

Example: Inserting a record in a file

A
  1. To insert at correct position we need to create space for new record, which will most likely involve moving existing records
  2. Moving records will change anchor records of some blocks
  3. If anchor records change or new blocks created, we will have to change some index entries
    a. As index is ordered file, has similar overheads to modification of records in ordered data file
66
Q

Explain briefly Clustering index

A
  1. Used if the ordering key in the data file is a non-key field
    a. Multiple records in the file can have the same value for the ordering field
  2. Index is an ordered file of fixed length records
    a. Each record has two fields
67
Q

What is the situation the Ordering field is known as the Clustering field?

A

In the situation where multiple records in the file can have the same value for the ordering field

68
Q

What is the characteristic of Clustering index

A

There is one index entry in the index file for each distinct value of the clustering field
a. Contains the value for the clustering field and a pointer to the first block in the data file with that value for it’s clustering field

69
Q

Clustering index example

A
70
Q

What is the condition on insertion and deletion in the Clustering index

A

Insertion and deletion is still a problem
a. Data records are physically ordered
b. Can be simplified by reserving the whole block or cluster of contiguous blocks for each value of clustering field
c. All records with that value are placed in block or block cluster reserved for it.

71
Q

Clustering index example 2

A
72
Q

Explain briefly the Secondary indexes

A
  1. A file can have at most one physical ordering field
    a. Thus, there can be at most one primary index or one clustering index, but not both
  2. A secondary index can be specified on any non-ordering field of a file
    b. A file can have multiple secondary indexes in addition to it’s primary access method (primary or clustering)
73
Q

What are the characteristics of the Secondary indexes?

A
  1. Such an index is an ordered file whose records are of fixed length
  2. Each record has two fields
    a. First field is the same datatype as the non-ordering field which is to be the indexing field
    b. Second field is either a block pointer or a record pointer
74
Q

Secondary indexes (key)

A
  1. Consider a secondary index based on a secondary key
  2. One index entry for each record data file which contains:
    a. Value of secondary key for the record
    b. Pointer to either the block in which the record is stored to the record itself
  3. Records not physically by values of secondary key ∴
    a. One index entry created for each record in data file
  4. Secondary index is ordered
75
Q

Secondary index (key) example

A
76
Q

What are the characteristics of Secondary indexes (key)

A
  1. Because search field is a key, there will be one index entry per record
  2. Usually needs more storage and longer search time than primary indexes because of it’s large number of entries
  3. Offers significant improvement in search time for an arbitrary record
    a. We would need to do a linear search on data file if secondary index did not exist
77
Q

What are the solutions on creating a secondary index on a non-key field

A

Solution one:
a. Include several entries with same value for indexing field
i. Same as implementation for secondary index based on secondary key
ii. One entry per record in file

Solution two:
a. Have variable length records for index entries
i. Keep list of pointers in index entry for each value of indexing field
ii. One pointer to each block that contains a record to each block that contains a record with indexing field value
iii. Can have an extra level of indirection to handle multiple pointers and keep index file records fixed-length

78
Q

Secondary index (none-key) example - Solution one

A
79
Q

Secondary index (none-key) example - solution two

A
80
Q

Multi-level indexes

A

Indexes can have indexes

81
Q

What are the characteristics on Multi-level indexes

A
  1. Help to reduce search space
    a. Create indexes on indexes
    i. Search a smaller index to find where to
    look in a larger index
    b. Useful if first level needs > 1 storage block
    c. We can have 3rd, 4th, … nth level
  2. Multi-level scheme can be used on any type of index, primary, clustering or secondary
    a. First level index must have fixed length entries and distinct values for indexing fields value