CSC 675 Review 3 - Introduction to System Implementation Techniques Flashcards

File Systems & Storage Structures - Ramakrishnan Chapter 8, 9, 10, 11 - E & N Ch. 13, 14

1
Q

Explain in your own words how data (bits) are stored on standard magnetic disk hardware.

A

The most basic unit of data on the disk is a single bit of information. A magnetic orientation in one direction on the disk could represent a “1”, an orientation in the opposite direction could represent a “0”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the minimum unit of data transfer to/from the disk?

A

The sector is the smallest unit that can be transferred to/from the disk, disk blocks are the smallest unit of transfer requested by the operating system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it usually more efficient to transfer a larger amount of data to/from the disk?

A

One large request requires less processing requests. Less disk I/O.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the major components of the time required to access a sector for data stored on disk?

A
  • seek time, the r/w heads are moved to the proper cylinder, mechanical movement is slow and inaccurate
  • the proper track is selected, fast electronic switch

• latency time,
- the head monitors the track until the proper sector rotates under the head, disks usually rotate 3600 rpm and about half a track will need to be scanned on average

• transfer time
- the data is transferred, typically very fast compared with seek time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe two ways in which the sector access time can be reduced (without modifying the disk hardware).

A
  1. Reduce the seek/latency for each I/O operation by maximizing sequentiality in data placement.
    • Hard to realize in a multi-user environment!
  2. Reduce the total number of disk I/O
    • This may or may not require less I/O time in single user mode, but will probably require less I/O time in multi-user mode (when the disk is shared and every disk I/O operation is effectively random).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is sequential disk I/O much faster than random disk I/O?

A

Sequential disk I/O is faster than random disk I/O because it requires less processing between transfers. The parts of the hard drive have to move around less to find the needed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a file?

A

ordered sequence of file blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are two standard techniques for associating disk blocks into files?

A

(1) sequential: file blocks are stored in contiguous disk blocks.
(2) indexed: file blocks are indexed and stored in random disk blocks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discuss the advantages and disadvantages of using sequential allocation for file blocks.

A

Fast sequential read/ write, difficult to allocate and deallocate disk space efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discuss the advantages and disadvantages of using indexed allocation for file blocks.

A

Slow read/write access, easy to allocate and deallocate disk space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Under what circumstances will sequential file I/O correspond to sequential disk I/O?

A

Sequential access within a file may or
may not correspond to sequential
access on the disk!!
–  Most mainframe operating systems and high-end
unix systems support high performance access.!
–  Most low-end unix systems and PC based
systems do not!
–  Most special purpose file systems implemented for use with DBS support sequential access.!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is it important to store type information with records?

A

?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some standard ways of storing type information.

A

Type information can be stored in a
variety of ways:

once per record set,

once per record (named record type),

once per field (named field type).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is blocking factor?

A

number of records per block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can blocking factor be used with the number of tuples in the file to calculate the file size?

A

records / blocking factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the record at a time interface?

A

Records have an order defined by the underlying file organization.

Interface provides means to filter

  • current
  • find
  • findNext
  • Delete
17
Q

How is it used by the DBS?

A

Basic file organization

18
Q

What is a storage structure?

A

a way to organize records on disk

19
Q

What is an access method?

A

organize information on the disk & retrieve individual records when requested by the run-time environment

20
Q

How are storage structures and access methods related?

A

access methods use storage structures to organize information

21
Q

What are the advantages and disadvantages of using an unordered file?

A

low insertion costs, high random & key sequential access costs.

22
Q

What are the advantages and disadvantages of using an ordered file?

A

high insertion costs, moderate random, low key sequential costs!

23
Q

What are the advantages and disadvantages of using a static hash file?

A

moderate insertion costs, low random, high key sequential costs

24
Q

What are the advantages and disadvantages of using a dynamic hash file?

A

index by high order

bits, no overflow blocks, exponential directory growth

25
What are the advantages and disadvantages of using a B+ tree index?
moderate insertion, random & key sequential costs.
26
What is the difference between a primary index and a clustering index?
primary index - built on primary key | physically ordered on non-key field
27
What is a secondary index?
Secondary Index: Indexing field Key field: dense index. Index ordered, block pointers in index. Non-key field: dense index, repeating field for duplicates, indirect index level (index points to block of pointers with same values).
28
How does a secondary index differ from a primary index?
?
29
Why is it possible to have any number of secondary indexes on a file, but at most one primary index?
?
30
What is a B tree?
a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children.
31
How are random and key sequential access performed using a B-tree?
Random access: partial root-leaf traversal Sequential access: extended in-order traversal
32
what is a B+ tree?
make tree shorter by storing records in leaves and index entries only in internal nodes. This introduces some redundancy, but increases performance, especially for key sequential access.
33
How are random and key sequential access performed using a B+ tree?
Random access: root-leaf traversal | Sequential access: scan data blocks sequence set pointers
34
Describe the structure of an index node in a B+ tree. How is the number of entries per B+ tree index node determined?
2d key + record value slots, 2d+1 pointer slots (order = 2d+1, or d)
35
How is the number of entries per B+ tree index node determined?
?
36
What are some of the different strategies for implementing record pointers( i.e. for secondary index construction)?
Offset in file RID: record ID (block number + slot number) TID: tuple ID (unique system generated value stored with each tuple) Primary Key (unique attribute data value stored with each tuple)
37
What are the advantages for each?
access time vs. Data Independence