Storage and Fault Tolerance Flashcards

1
Q

What does RAID stand for?

A

Redundant Array of Independent Disks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a RAID?

A
  • Several disks play the role of one
  • Each disk detects its own errors using codes in each sector
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What should a RAID do?

A
  • Provide better performance
  • Normal read/write even when we have a bad sector or a whole disk fails.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is RAID0?

A

Uses “striping” (track 0 on first disk, track 1 on second, etc) for increased throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is RAID0’s performance?

A
  • Nx data throughput (N is number of disks)
  • less queuing delay (latency)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MTTF?

A

Mean Time to Failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

MTTDL

A

Mean Time to Data Loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Failure rate

A

f = failures per disk per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Single disk MTTF

A

1/f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Single disk MTTDL

A

Single disk MTTF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the MTTF of RAID0 with N disks?

A

MTTF_N = MTTDL_N = MTTD_1 / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is RAID1?

A

Uses “mirroring” (same data on multiple disks) for reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the throughput of a RAID0 with N disks?

A

N x the throughput of one disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is throughput of RAID1?

A

Write: same as 1 disk
Read: N x throughput of one disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reliability of RAID1?

A

RAID1 tolerates any faults that affect one disk. Has ECC on each disk sector, so it knows where the error is and on which disk. It can then use the mirror copy on the other disk instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MTTR

A

Mean Time to Repair

17
Q

What is the MTTDL for RAID1 if we don’t repair a damaged disk?

A

MTTDL = (MTTF_1/N) + MTTF_1

18
Q

What is the MTTDL for RAID1 if we DO repair a damaged disk?

A

MTTDL = (MTTF_1/N)*(MTTF_1/MTTR)

i.e., MUCH MUCH MUCH better than not repairing.

19
Q

What is RAID4?

A

Uses “block-interleaved parity” to both improve performance and reliability.

For N disks, N-1 have data striped just like in RAID0 and 1 has the parity bits calculated from the other disks using bitwise XOR.

20
Q

What is the throughput of RAID4?

A

Reads: (N-1) * throughput of one disk
Writes: 1/2 the throughput of one disk!

When we write in RAID4, we need to write to a disk, read from the parity disk, and write to a parity disk. The parity read can happen in parallel with the main write, so overall it takes twice as long.

THIS is why we need RAID5.

21
Q

MTTF of RAID4

A

(MTTF_1/N) * (MTTF_1/(N-1)*MTTR_1)

But what’s the N-1 about? This is because it takes MTTF_1/N time for the first failure. We plan to repair before the second failure, which takes MTTF_1/(N-1) * 1/MTTR_1.

22
Q

How do we compute new parity on RAID4 write?

A

First we XOR the new data and old data for the block we’re writing to (which finds all changes). Then we XOR that with the old parity. The result is stored as the new parity.

Because we only have one parity disk, that creates a bottleneck on writes. Hence the need for RAID5.

23
Q

What is RAID5?

A

DISTRIBUTED block-interleaved parity. Similar to RAID4, but the parity stripes are distributed among all disks (the first might be on disk 4, the next on disk 1, the next on 2, etc).

24
Q

What is the throughput of RAID5?

A

Where N is the total number of disks…

Reads: N * throughput of one disk (because now we can actually read from all N disks at once!)
Writes: N/4 * throughput of one disk! (because we still need 4 total accesses per write, but they’re distributed over all N disks)

25
Q

What is the reliability of RAID5?

A

Same as RAID4! We can always recover from the loss of one disk. If we lose parity, we can still read/write from the other disks. If we lose one data disk, we can reconstruct data from parity.

So it’s just as reliable, without bottleneck on writes.

MTTF is the same as RAID4.

26
Q

MTTF of RAID5

A

Same as RAID4!

(MTTF_1/N(N-1)) * (MTTF_1/MTTR_1)

But what’s the N-1 about? This is because it takes MTTF_1/N time for the first failure. We plan to repair before the second failure, which takes MTTF_1/(N-1) * 1/MTTR_1.

27
Q

What is RAID6?

A
  • Two parity blocks per group.
  • Can work when two stripes per group have failed
  • One true parity block
  • The other “parity” block is a different kind of check block
  • When one disk fails, use parity
  • When two fail, solve some equations using both parity and “parity” blocks to recover data
28
Q

When should you use RAID6?

A
29
Q

RAID5 vs RAID6

A
  • RAID6 has 2x the overhead (what is the “overhead”?)
  • More write overhead: when we write, we need to read and write the data block plus BOTH check blocks
  • RAID6 is only useful when the chance of a second disk failing during our MTTR is actually high. This is unusual.
30
Q

Is RAID6 overkill?

A

RAID6 seems like overkill when you assumes disk failures are independent. Under RAID5, the likelihood of a second independent failure happening during the repair period is very low. So why use RAID6?

Because disk failures are NOT necessarily independent. For example, say you remove the wrong disk when trying to replace the failed one: you now have a second, correlated failure. RAID6 would’ve been nice.