Physical Storage(2020) Flashcards

(42 cards)

1
Q

Data Storage:

Major Topics

A
  • Levels of Storage
  • Evaluating Storage
  • Magnetic Disk Physical Components
  • Data Organization
  • RAID
  • Techniques
  • Issues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Physical Storage:

Storage Levels

A
  • Primary
    • Cache
    • Main Memory
  • Secondary
    • Flash Memory
    • Magnetic Disk
  • Tertiary
    • Optical Disk
    • Magnetic Tapes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Primary Storage

Devices

A
  • Cache
  • Main Memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Secondary Storage

Devices

A
  • Flash Memory (SSD)
  • Magnetic Disk (Hard Drive)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Tertiary Storage

Devices

A
  • Optical Disk
  • Magnetic Tapes

(Basically any external, sturdy storage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Storage Devices:

Cache Overview

A
  • Primary Storage Level
  • Fastest form of storage
  • Volatile - only used temporarily
  • Managed by the computer system hardware
  • Typically multiple levels of cache
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Storage Devices:

Main Memory Overview

A
  • Primary Storage Level
  • Fast Access
    • 10s to 100s of nanoseconds
  • Generally too small/expensive to store entire databases
  • Typically RAM
  • Volatile
    • Usually lost if power is lost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Storage Devices:

Flash Memory Overview

A
  • Secondary Storage Level
  • Reads are roughly as fast at main memory
  • Non-volatile
  • Limited number of read/writes (10k - 1M)
    • When erasing, has to wipe entire block of memory
  • Write is SLOW(Micro seconds)
    • Erase is slower
  • USB sticks, cameras, phones, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Storage Devices:

Magnetic Disk Overview

A
  • Secondary Storage Level
  • Non-volatile
    • But disk failure can still destroy data
  • Stored on spinning disk
  • Read/writes magnetically
  • Primary means of long term storage for databases
  • Must be moved to memory for read/write (VERY SLOW)
  • Can read in any order
  • Rather cheap and large amounts of storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Storage Devices:

Optical Storage Overview

A
  • Tertiary Storage Level
  • Non-volatile
  • Read from physical disk using a laser
  • CD, DVD and Blu-Ray most popular forms
    • CD is the smallest
  • Some are write once, read many - (CD-R)
  • Some are many writes, many reads - (CD-RW)
  • Slower than magnetic disk
  • “Juke Box” systems were used to store disks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Storage Devices:

Tape Storage Overview

A
  • Tertiary Storage Level
  • Non-volatile
    • Backup and archival data
  • Sequential access
    • Extremely slow
  • Very High capacity
  • Tape jukeboxes can store petabytes of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Magnetic Disk:

Components

A
  • Platter (disks)
    • Divided into circular “Tracks
    • Tracks broken into “Sectors
  • Spindle
  • Read-Write Head
  • Arm Assembly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Magnetic Disks:

Read/Write Head

A
  • Very close to the platter, almost touching
  • Reads and writes data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Magnetic Disk:

Platter

A
  • Disk is split into multiple “Platters
  • Each platter is divided into circular Tracks, line lanes
    • Over 50-100K Tracks per Platter
  • Tracks are broken into Sectors, chunks of lanes
    • Smallest unit of data that can be written
    • Typically 512 bytes
    • More on outer edge of platter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Magnetic Disk:

Reading and Writing

A
  • Reads/Writes accomplished via the Read/Write Head
  • After Write, there is a checksum
  • Read again, and check
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Magnetic Disk:

Disk Subsystem Overview

A
  • Multiple Disks are connected to a computer through a main controller
    • Controller manages the “big picture”
    • Individual disks usually handle checksums, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Magnetic Disk:

Types of Disk Subsystems

A
  • SAN - Storage Area Networks
    • Connected via high speed network to servers
  • NAS - Network Attached Storage
    • Uses network file system protocol
    • Allows for storage like a file system
18
Q

Evaluating Storage:

Concerns/Factors

19
Q

Storage Evaluation:

Access Time

A

The time it takes from issuing read/write command to when data transfer actually begins.

Factors:

  • Seek Time
    • Time to position arm over correct track
    • ~4-10 ms
  • Rotational Latency
    • Time it takes for sector to appear under head
    • ~4-11 ms, depending on how fast disk spins
20
Q

Storage Evaluation:

Data Transfer Rate

A

The rate at which data can be stored or retrieved

  • Depends on the controller rate
    • SATA vs Fiber connections may limit if multiple disks share
21
Q

Storage Evaluation:

MTTF

A

Mean Time To Failure

  • Average time that a disk is expected to run continuously without any failure
  • Typically 3-5 years
  • Decreases as the disk ages
  • If the MTTF is 1,200,000 hours:
    • Given 1000 new disks, on average one will fail every 1200 hours
22
Q

Block:

Definition and overview

A

A Block is a contiguous sequence of sectors from a single track

  • Data is transferred from disk to memory in blocks
  • Smaller blocks = more reads from disk
  • Larger blocks = more wasted space
  • The Elevator Algrorithm is used to schedule reads and writes
23
Q

File Organization

Overview

A
  • Related information is stored nearby
  • Files may get fragmented over time:
    • Parts of file deleted
    • Free blocks are scattered, a new file is scattered
    • Increases seek time
  • Defragmenting a hard drive can improve speeds
24
Q

Non-volatile Write Buffers

A
  • Basic Idea:
    • Write blocks to battery backed up RAM or flash memory BEFORE writing to disk
  • Controller can write to disk when it has nothing else to do, or a task has been in RAM for a while
  • Database operations can continue without waiting for data to be written to disk
  • Write orders can be optimized before going to disk
25
Storage Strategies
* RAID * Redundant Arrays of Independent Disks * Mirroring * Duplicate every disk * Parallelism * Improve transfer rate by "striping" data across multiple disks
26
RAID Overview
* Redundant Arrays of Independent Disks * Manage a large number of disks, but provide the view of a single disk * High speed and capacity by utilizing mutliple disks in parallel * High reliability by storing data redundantly * Originally a cost effective method, as opposed to large, expensive disks * "i" originally stood for "inexpensive" * Today, used for higher reliability and bandwidth
27
Storage Strategies: Mirroring
* Duplicate every disk * Writes occur to both disks * Reads can use either disk * If one disk fails, the system is still operational * Only considered to fail if both go down simultaneously * Can repair or replace the one that failed * Probability of both going offline at same time is very low * Independent of outside factors, of course * Fires * Collapsed buildings * Disasters
28
Storage Strategies: Parallelism
* Load balance multiple small inputs to increase throughput * Parallelize large inputs to reduce response time * We can improve the transfer rate by striping data across multiple disks * Types of Striping: * Bit Striping * Block Striping
29
Storage Strategies: Parallelism: Bit Striping
* Idea: * Split the bits of a byte between multiple disks * Suppose 8 disks: * Write bit i of byte to disk i * Can access data 8X faster than single disk * Seek time is worse than single disk
30
Storage Strategies: Parallelism: Block Striping
* Idea: * Write blocks to individual disks, instead of bits * Requests for different blocks can run in parallel * A request for a long sequence of blocks can be run in parallel
31
Storage Strategies: RAID: RAID Levels Overview
* Combines Mirroring, striping * Adds Parity * Each level has different performance and reliability: Levels: * RAID 0 * Block Striping, no redundancy * RAID 1+0 * Block Striping, Mirrored Disks * RAID 2 * Bit Striping, uses parity bit * RAID 3 * Bit Striping, uses parity bit * RAID 4 * Block Striping, uses parity block * RAID 5 * Block Striping, distributed parity block * RAID 6 * Block Striping, distributed redundant bits
32
RAID: RAID 0
* Block Striping * No redundancy * Used in high performance, where data loss is not critical
33
RAID: RAID 1 + 0
* Block Striping * Uses Mirrored disks for redundancy * Best Write performance * Often used for log files or database systems
34
RAID: RAID 2
* Bit Striping * Uses Parity Bit * Counts number of set bits, Parity=0 if even * Often used for systems where you need to ensure the data is not corrupt * Needs 3 extra disks * Faster transfer rate, many more I/Os
35
RAID: RAID 3
* Bit Striping * Uses Parity Bit * Unlike RAID 2, only needs a _single disk_ for parity * Cheaper than RAID 2 * If a bit is lost, then the parity of the remaining bits is calculated * If it matches parity bit, the lost bit is 0 * Otherwise it is 1 * Like RAID 2, Faster transfer rate, many more I/Os
36
RAID: RAID 4
* Block Striping * Uses Parity Block * Only needs a single disk for parity block * Faster for large reads and writes * Slower for very small reads and writes
37
RAID: RAID 5
* Block Striping * _Distributed_ Parity Block * Stores disk ***i*** parity block on: * ( ***i*** mod ***n***) + 1 disk, * where ***n*** is the number of disks * Subsumes RAID level 4
38
RAID: RAID 6
* Block Striping * Distributed Redundant Bits * For every 4 bits stored on a disk, stores 2 in another disk * Better reliability than RAID 5, but costs more due to extra space * Not widely used
39
RAID: Considerations when choosing a RAID Level
* Monetary Costs * Performance * Number of I/O operations per second * Performance during failure * Performance during rebuild
40
RAID: Choosing a RAID Level: General Guidelines
* RAID 0 is used when data safety isn't important * RAID 2 and 4 are never used * Replaced by RAID 3 and 5 * RAID 3 is not used much, because bit striping forces all disks to operate for a single block of data * RAID 6 is rarely used because 1 and 5 offer adequate protection against data loss
41
RAID: Choosing a RAID Level: Comparison of RAID 1 and 5
* RAID 1 provides better write performance than 5 * 5 requires 2 block reads and 2 block writes to write a single block * RAID 1 only requires 2 writes * RAID 1 preferred for frequently updated environments * RAID 1 used to have a higher cost than RAID 5 * Capacity increasing rapidly * When enough disks to satisfy I/O, often there are extra disks that can be used for storage * RAID 5 preferred for low update rates and large amounts of data
42
Data Storage: Latent Failures
* Data that was successfully written has now been corrupted * Data Scrubbing: * Continuously scan for latent failures * Many systems keep spare disks online to swap out failed disks quickly * Redundant power supplies, multiple controllers, etc