Data Storage & Organization Flashcards
What is transfer size?
the unit of memory that can be individually accessed, read and written
What is latency?
the time it takes for info to be delivered after the initial request is made
What is bandwidth?
The rate at which info can be delivered.
What is Processor cache?
Faster memory storing recently used data that reduces the average memory access time.
What is a Solid State Drive?
- uses flash memory for storage
What is RAID (Redundant Arrays of Independent Disks) ?
a disk organization technique that utilize a large number of inexpensive, mass-market disks to provide increased reliability, performance and storage.
What do RAIDs do?
store extra data incase of disk failure.
- Mirror or shadow
- duplicates entires disks on multiple disks
What is mean time to failure (MTTF)?
the average time the device is expected to run continuously without any failure.
Explain RAID Level 0
and what is its capacity?
Striping at the block level (non-redundant)
- used for high performance where data loss is not crucial (parallesism)
Capacity: N
Explain RAID Level 1 and what is its capacity?
Mirrored disks (redundancy)
- for apps that require redundancy (protection from disk failure)
Capacity: N/2
Explain RAID Level 2
Memory-Style-Error-Correcting-Codes with bit stripping
Explain RAID Level 5 and what is its capacity?
- offers both reliability & increased performance
Capacity: N-1
Explain RAID Level 6 and what is its capacity?
- offers extra redundancy compared to Level 5
- used to deal with multiple drive failures
Capacity: N - X
X = # of parity drives such as 2)
What is a block-level interface
Allows the program to read & write a chunk of memory called a block (or page) from the device
What is a byte-level interface
allows the program to read & write individually addressable bytes from the device
What is a file-level interface
abstracts away the device addressable characteristics & provides a standard byte-level interface for files to programs running on the OS
Hierarchy of a database
Database is made up of files
- each file contains blocks
- each block contains records
- each record contains fields
- each field is a representation of a data item in a record
What does a record consist of?
One or more fields grouped together
What are the two main types of records? And what are they?
- Variable-length records: the size of the record varies
2. Fixed-length records: all records have the same size
4 situations where variable formats are useful
- The data doesn’t have a regular structure in most cases
- The data values are sparse in the records
- There are repeating fields in the records
- The data evolves quickly so schema evolution is challenging
3 disadvantages of variable formats
- Waste space by repeating schema info for every record
- Allocating variable-sized records efficiently is challenging
- Query processing is more difficult and less efficient when the structure of the data varies
What are the 6 issues related to storing records in blocks? Describe each one.
- Separation
- how do we separate adjacent records - Spanning
- can a record cross a block boundary? - Clustering
- can a block cross a block boundary? - Splitting
- are records allocated in multiple blocks? - Ordering
- are the records sorted in any way? - Addressing
- how do we reference a given record?
What are the 2 options when records do not fit in a block?
Describe them
Unspanned
- waste space at the end of the block
Spanned
- Start a records at the end of a block and continue on the next
What is clustering?
6 storing issues
allocating records of different types together on the same block (or same file) cause they are frequently accessed together.