Module 9b - Consistency and Replication (part 2) Flashcards

1
Q

Many commercial databases use “primary-based replication” protocols. What are primary-based replication protocols?

A

Protocols in which all updates are executed by a designated primary replica and then pushed to one or more backup replicas.

OR

Primary-based protocols require that each data item have a primary copy (or home) on which all writes are performed - backups inherit these updates from the primary protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Protocols in primary-based replication can be classified as “remote-write” or “local-write”. What does remote-write mean?

What does the workflow look like?

A

The primary replica is stationary and therefore data must be updated remotely by the backup servers

Workflow for remote write:

  1. Write request for item x (goes to backup)
  2. Forward request to primary, primary writes x
  3. Tell backups to update & write x
  4. Acknowledge that the update has been completed by backups
  5. Acknowledge to client that the write has been completed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Protocols in primary-based replication can be classified as “remote-write” or “local-write”. What does local-write mean?

What does the workflow look like?

A

The primary replica is migrates from server to server, allowing clients to perform updates to their local replica

Workflow for local write:

  1. Write request for item x (goes to client’s backup)
  2. Move item x to new primary (which is the client’s backup)
  3. Acknowledge write completed to client
  4. New primary tells backups to update
  5. Acknowledge to new primary that backups have updated x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In Primary-based protocols, if the ______ replica fails, then one of the ______ replicas may take over as the new ______. Accurate _______ detection is necessary to prevent ______ situations

A
primary
backup
primary
failure
split-brain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the benefit & drawbacks of forcing all updates through a primary replica?

A

Benefit:
Makes it possible to implement strong consistency models such as sequential consistency & linearizability

Drawbacks:

  • Can lean to performance bottlenecks
  • Temporary loss of availability when the primary fails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

______ protocols allow replicas to receive updates such that each update must be accepted by a sufficiently large ______ of replicas.

A

Quorum-based

subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quorum systems improve ______ of ______ data. Every time a group of servers needs to agree on something, a ______ is involved in the decisions

A

consistency
replicated
quorum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Read-write quorums define two parameters n_R and n_W. What do these two mean? What are they signifying?

A

n_R is the minimum number of replicas that must participate in a read operation. These are the “read-quorums”

n_W is the minimum number of replicas that must participate in a write operation these are the “write quorums”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are read-quorums and write-quorums?

A

read-quorums: The subset of all replicas which are involved in reading

write-quorums: The subset of all replicas which are involved in writing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In distributed databases, read and write quorums must satisfy 2 rules of overlap. What are they?

A
  1. The read and write quorums must overlap: n_R + n_W > N
  2. Two write quorums must overlap: n_W + n_W > N

Rule 2 means that at least half of the replicas must be write quorums, this enables detection of write-write conflicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In Quorum-based protocols, what does ensuring that read and write quorums overlap enable?

A

Enables detection of read-write conflicts.

All read-quorums will be consistent with each other, and all write-quorums will be consistent with each other. Therefore, there is no opportunity for read-write conflicts & the execution is guaranteed to be sequentially consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ensuring that two write-quorums overlap enable?

A

Enables detection of write-write conflicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Quorum-based protocols, what constraint do we have on N (the number of protocols)?

(not in relation to N_r and N_w)

A

N (number of replicas) must be odd.

Correction: it is “usually” chosen as odd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In Quorum-based protocols, what constraint do we have on n_R, n_W and N with respect to each other?

A
  1. n_R + n_W > N
  2. n_W + n_W > N
  3. n_W > 0
  4. n_R > 0
  5. N is odd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does ROWA stand for? and what is a ROWA scheme in quorum-based protocols?

A

ROWA - read one, write all

When you have n_R =1 and n_W = N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

partial-quorums can be configured to provide various degrees of _____ by changing ____ and _____.

A

consistency
n_R
n_W

17
Q

What is the difference between strong and weak consistency in distributed systems?

A

Strong consistency: The data in all replicas is the same at any time. If key x is read from replica A and B at the same time, they should return the same value

Weak consistency: There is no guarantee that all replicas have the same data at any time.

18
Q

In partial-quorums, how can adjusting n_R and n_W provide strong or weak consistency?

A

if n_R + n_W > N, then the system will have strong consistency

if n_R + n_W <= N, then the system will have weaker consistency - depending on n_R and n_W

19
Q

In _____ consistency mode, the system cannot detect read-write conflicts, nor write-write conflicts

20
Q

What is the “last write wins” policy?

A

Whenever you have 2 writes incoming into a system at the same time, their timestamps are used to resolve which one will be used. The later one will be the one which is used

21
Q

To resolve ______ conflicts, updates are tagged with ______, and a ______ policy is applied

A

write-write
timestamps
resolution

22
Q

In Quorum-based protocols, whenever the subset of replicas to not satisfy the 2 rules of overlap for (strict) quorums, then they are referred to as _____ ______.

Note that the 2 rules of overlap are:
n_R + n_W > N
n_W + n_W > N

A

partial quorums

23
Q

Describe the difference between full replication and partial replication in databases

A

Full replication: the full database is stored in each replica (all data is duplicated)

Partial replication: only a fragment of the database is stored in a replica, just like sharding. Frequently used fragments may be duplicated.

24
Q

Suppose “n” denotes the number of replicas for one data object. If n == number of replicas, then what type of replication is this scheme using?

A

Full replication. Every server has a copy of the data object.

25
When the replication factor is less than the total number of servers, this is known as _____ replication
partial
26
_____ replication allows us to increase the effective storage capacity of the system through the addition of _____ while keeping the ______ ______ constant.
partial servers replication factor
27
When the number of servers/replicas is larger than the replication factor (partial replication), then what does each server/replica store?
a fragment/subset of the data used of the system
28
What is eventually-consistent replication?
Whenever a read or write is issued to a distributed system, it is resolved to the nearest replica. This replica is responsible for propagating the message to the remaining replicas
29
In an ______ _______ replication system, a server that receives an update will reply with an ________ to the client first, and then propagate ________ to the remaining replicas
eventually-consistent acknowledgement lazily/asynchronously
30
What happens in an eventually-consistent replication system when an update is being propagated, and a replica is unreachable? How do they reach consistency?
It can be updated later using an anti-entropy mechanism. This can be replicas periodically exchanging hashes of data to detect discrepancies.
31
What do eventually consistent systems do to ensure that data is consistent across replicas?
Periodically, replicas exchange hashes of data to detect discrepancies, using merkle/hash trees. Timestamps are used to tell which update is the latest.
32
In eventually consistent systems, how do replicas determine what is the latest version of a data object?
Using timestamps. The largest timestamp is the correct version
33
What is the purpose of merkle trees (or hash trees) in eventually-consistent systems?
The trees are exchanged between replicas to compare and update versions of data. The trees act as a compact version of the data, and allows the replicas to find the source of error.
34
In an eventually-consistent system, what is a "stale" read?
Whenever a client connects to replica which has not yet received the latest version of a data object, and this replica returns the old version of the object
35
Merkle trees are used to allow replicas to efficiently compare values of data objects. Describe the structure of these trees.
The leaf of the tree has the raw data blocks, and each parent of a node in the tree contains the concatenation of the hashes of their child nodes. This makes it efficient to compare hashes between replicas