Chapter 6: Consistency & Replication Flashcards

1
Q

Why Replication?

A
  • Performance
    • Caching data at browsers and proxy servers
    • e.g. Content Delivery Network with several locations
  • High availability
    • Upon crashes, service offered by replica
    • Upon network partition, service available to clients in partition
  • Fault tolerance
    • Providing reliable service in face of faulty servers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The “cost” of replication NN

A
  • Cost to keep replicas up to date in face of updates?
    • E.g., additional bandwidth, number of messages exchanged, higher latency (i.e., service‐response time), complexity of code, etc.
  • How can the problem of stale (out‐of‐date) data current reads at replicas be overcome?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Consistency Models

A
  • Definition (Consistency model)
    • A contract between a distributed data store and a set of processes, which specifies what the results of read/write operations are in the presence of concurrency
    • Distributed data store as synonym for
      distributed database, shared memory, shared files, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Consistency Violation

A
  • With concurrent read & write operations on a distributed data store, data inconsistency may arise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Overview: Consistency Models important - know lal

A
  • Strict consistency
  • Sequential consistency
  • Linearizable consistency
  • Causal consistency
  • FIFO consistency
  • Weak consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Strict Consistency

A
  • Definition: Any read on a data item x returns a value corresponding to the result of the most recent write on x
  • Uni‐processor systems have traditionally observed strict consistency, … but what about multi‐processor systems?
    a = 1; a = 2; print(a); Output? most recent !
  • Definition assumes existence of absolute global time for unambiguous determination of “most recent“.
  • What do we know about uniquely time stamping events in distributed systems via absolute time references?
  • Under strict consistent, all writes are instantaneously visible to all processes and absolute global time order is maintained
  • Similarly, if a read is done, then it gets the most recent value, no matter how quickly the next write is done
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Strict Consistency: Thought Experiment

A
  • To satisfy strict consistency, laws of physics may have to be violated! Obviously, not an option!!!
  • To realize strict consistency in this case, R(X)a would have to travel at 10 times the speed of light!
  • Strict consistency is impossible to guarantee in distributed systems, due to reliance on precise global time in its definition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Definition of Sequential Consistency

A
  • The result of any execution is the same as if the operations by all processes on the data store were executed in some sequential order and …
  • … the operations of each individual process appear in this sequence in the order specified by its program
  • Operations refer to read and write
  • Weaker than strict consistency
  • When processes run concurrently, any valid interleaving of read and write operation is acceptable behavior, but all processes see the same interleaving of operations
  • Nothing is said about time, i.e., no reference to “most recent”, unlike strict consistency
  • Property: if the read time is r, the write time is w, and the minimal packet transfer time between nodes is t, then r+w>=t holds (if you reduce reading time, writing time goes up)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Definition of Linearizability

A
  • Like Sequential consitency
  • If there is an overlap order is not important - if not it is
  • In addition, if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence (t is defined as intervals)
  • tsOP(x) denotes the timestamp assigned to operation OP that is performed on data item x, and OP is either read (R) or write (W).
  • Like strict consistency, assumes global time, but not absolute global time
  • Assumes processes in the system have physical clocks synchronized to within an error bounded
  • If W(x)b was the most recent write operation and there is no other write operation overlapping with W(x)b, then any later read should return b
  • If W(x)a and W(x)b were two most‐recent overlapping write operation, then any later read should return either a or b, not something else
  • A linearizable data store is also sequentially consistent
  • I.e., linearizability is more restrictive
  • Difference is the ordering according to a set of synchronized clocks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Intuition for Causal Consistency

A
  • Weaker than sequential consistency
  • Distinguish events that are potentially causally related and those that are not (concurrent events) Rb
  • If event B is influenced (caused) by an earlier event A, causality requires that everyone first sees A then B
  • Concurrent events (i.e.,writes) may be seen in a different order on different machines
  • Reading of x and writing of y are potentially causally

related

  • Computation of y may have depended on value of x read by P2 (written by P1); e.g., y = f(x)
  • On the other hand, y may not have depended on x, yet potential causality still holds in our formalization!
  • Read followed by write in the same process, the two are potentially causally related
    • Example: R(x)a … W(y)b (e.g., it may be that y=f(x))
  • Read is potentially causally related to write that provided the value read got
    • Example: W(x)a … R(x)a
  • Independent writes by two process on a variable are not causally related (they are concurrent)
    • Example: W(x)a … W(x)b
  • Potentially causally related writes must be seen by all processes in the same order
  • Concurrent writes may be seen in a different order by different processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Definition of FIFO Consistency

A
  • Writes by a single process are seen by all other processes in the order in which they were issued
  • Writes from different processes may be seen in a different order by different processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Summary of Consistency Models

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Eventual Consistency Motivation

A
  • So far, goal was to maintain consistency in presence of concurrent read/write operations
  • There are use cases with no (few) updates to the same data, while most operations are reads
  • E.g., distributed database, DNS, Web, K/V‐Stores
    • Majority of operations are reads
    • Writes are mostly done by central authorities (domain owners, Web masters, etc.)
    • Updates are keyed and arrive from single point, only
  • Few concurrent updates and updates propagate lazily
  • If no updates take place for a long time, all replicas will gradually become consistent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Eventual Consistency

A
  • Eventual consistency is desirable for large‐scale distributed systems where high availability is important
  • Tends to be cheap to implement
  • Constitutes a challenge for environments where strong consistency is important
  • How well does eventual consistency work for mobile user? For high load of requests not suitable
  • Here, we need another class of consistency, i.e., client-centric consistency models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Client‐Centric Consistency Models

A
  • Goal: How can we avoid system‐wide consistency by concentrating on what clients want, instead of what should be maintained by servers.
  • Different client‐centric consistency models – Monotonic reads
    • Monotonic writes
    • Read‐your‐writes
    • Write‐follows‐reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Definition: Monotonic Reads

A
  • A data store provides monotonic‐read consistency if the following condition holds:
    • If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value or a more recent value
  • Monotonic‐read consistency guarantees that if a process has seen a value of x at time t, it will never see an older version of x at a later time
  • The previously seen value by the process must have reached the new site, where the reader wants to read
17
Q

Monotonic Writes

A
  • Propagation of writes in the correct order to multiple destinations (copies of data store)
  • A data store provides monotonic‐write consistency if the following condition holds:
    • A write operation by a process on a data item x is completed before any successive write operation on x by the same process
  • In other words, a write operation on a copy of data item x is performed only if that copy has been brought up to date
18
Q

Read Your Writes

A
  • A data store provides read‐your writes consistency if the following condition holds:
    • The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process
  • A write operation is always completed before a successive read operation by the same process, no matter where that read operation takes place
19
Q

Writes Follow Reads

A
  • A data store is said to provide writes‐follow‐reads consistency if the following condition holds:
    • A write operation by a process on a data item x following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read
    • E.g., … R(x) … W(x) (for W, same or more recent value of x)
  • Any successive write operation by a process on a data item x will be performed on a copy of x that is up to date with the value most recently read by that process
20
Q

Replication Techniques Satisfying Sequential Consistency

A
  • • Two main techniques
    • Primary Backup
    • Active Replication
21
Q

System Model

A
  • Processes are either clients or replicas
  • Clients and replicas exchange messages through point‐to‐point links
  • Processes can crash
22
Q

Passive Replication: Primary Backup

A
  • Primary
    • Receives invocations from clients and sends back the answers (clint talks always to primary)
  • Backup
    • Interacts with the primary
    • Is used to replace the primary when it crashes
23
Q

Primary Backup Scenario

A

Guarantee sequential consistency: Order in which the primary receives clients’ invocations define the order of the operation on the data item(s).

24
Q

Primary Backup: Presence of Crash

A
  • Three scenarios
    • Primary fails after the client receives the answer -> continue
    • Primary fails before sending update messages -> details needed
    • Primary fails after sending update messages and before receiving all the ACK messages –> leader election
  • In all cases, a new primary is elected from among the backups
25
Q

Primary fails after the client receives the answer

A
  • Nothing bad happens since the client has already received the response
  • A new primary is elected
26
Q

Primary fails before sending update messages

A
  • Client does not get an answer and resends the requests after a timeout
  • The newly elected primary will handle the request as new
27
Q

Primary fails after sending update messages and before receiving all ACK messages

A
  • When a primary fails, a new primary is elected by the backup replicas
  • Client does not get an answer and resends the requests after a timeout
  • Since the new primary has already processed the update message, it immediately sends the response to the client without further action (we need a log file for processed requests)
28
Q

Active Replication

A
  • There is no coordinator, all replicas play the same role
  • Each replica is deterministic: If all replicas start from the same state and receive the same input, they will produce the same output
  • As a matter of fact, clients will receive the same response, one from each replica
29
Q

Gossip Architecture

A
  • Gossip architecture aims at achieving high availability
  • Does not offer strong consistencies guarantees such as sequential consistency
  • Updates are propagated in a lazy fashion
  • Response received by a client from a replica to its query should not be older than the last response that the client had received from any replicas
  • Client and replicas are separate processes
  • Replicas periodically exchangegossip’ messages to convey updates to other replicas
  • Front end (or client directly) communicates with an individual replica, but it may interact with other replicas, e.g., to tolerate failures
30
Q

Message Types in Gossip Architecture

A
  • Query message: A message from a client to its closet replica to determine the state of the system
  • Update message: A message from a client to the closest replica to update the state of the system
  • Gossip message: This message contains some updates that a replica thinks that some other replica might not have received (send to a number of other replicas)