Chapter 5: Coordination & Agreement Flashcards

1
Q

Synchronous distributed system model

A
  • Each message transmitted over a channel is received within a known bounded time
  • Time to execute each step of a process has known lower and upper bounds
  • Each process has a local clock whose driftrate from real time has a known bound
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Asynchronous distributed system model NN

A
  • Messages may need an arbitrary time to be transmitted
  • Each step of a process may take an arbitrary time to complete
  • Clock driftrates are arbitrary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Thought experiment - Some takeaways

A
  • Asynchronous model is representative for Internet and many practical distributed applications
  • Solution valid for asynchronousd distributed systems are also valid for synchronous ones
  • Many design problems can not besolved in an asynchronous world (e.g., when vs. who leads attack)
  • Applyt timeouts and timing constraints to reduce uncertainty and to bring synchronous model into the picture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Failure models

A
  • Process and communication channel may fail, i.e., depart from correct and expected behaviour
  • Failure model offers understanding for effects of failures
  • Failure types
    • Omission failures (one ore more responses fail)
    • Arbitrary failures (wrong value)
    • Timing failures (outside interval)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Process omission failures

A
  • Fails by halting, i.e., crashes (e.g. bluescreen)
  • Fail‐stop failure
    • Process halts and remains halted
    • Others can detect this state
  • Crash failure
    • Process halts and remains halted
    • Others can (only) suspect this state
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Communication omission failure

A
  • Receiving process does not receive message
  • Typically becaue of
    • Buffer overlfow
    • Network partition
  • Failure types
    • Send-omission
    • Receive-omission
    • Channel-omission
  • Process vs. communication failure: Generally, impossible to distinguish
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In our thought experiment NN

A
  • Could Army i detect whether Army j has been defeated? (i.e., process failed?)
  • Assuming the enemy could attack
  • While undefeated, send periodic message (e.g. hearbeat)
    • Assume enemy can not corrupt message
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Arbitrary failures

A
  • Encompasses prior failure classes
  • Process not just stops or crashes but processes requests incorrectly, corrupts state, produces inconsistent & incorrect output (commission failures)
  • Does not follow agreed protocol, messages get corrupted, messages introduced
  • Result:Systembehavesunpredictable
  • A.k.a. Byzantine failures / Produces –> produces wrong output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Access to shared variables and RACE condition

A
  • Imaging a globally shared variable counter in a process accessible to multiple threads:
    • counter ++
    • counter –
    • Register could be 4, 6 or 5
  • Race condition:
    • Several threads manipulate shared data concurrently. The final value of the data depends upon which thread finishes last.
    • To prevent race conditions, concurrent processes must be synchronized
  • The statements: counter++; counter‐‐; must each be executed atomically.
  • Atomic operation means an operation that completes in its entirety without interruption.
  • This is achieved through synchronization primitives
  • Shared variable accessed in critical section, protected by synchronization primitives
  • Known as the critical section problem or as mutual exclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Distributed mutual exclusion

A
  • In distributed systems, mutual exclusion is more complex due to lack of:
    • Shared memory
    • Timing issues
    • Lack of a global clock
    • Event ordering
  • Applications
    • Accessing a shared resource in distributed systems
    • Communicating over a wireless channel 802.11
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Critical section (CS) problem: No shared memory

A
  • System with n processes
  • Processes access shared resources in CS
  • Coordinate access to CS via message passing
  • Application‐level protocol for accessing CS
    • Enter_CS() – enter CS, block if necessary
    • ResourceAccess() – access shared resource in CS
    • Exit_CS() – leave CS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions for CS NN

A
  • System is asynchronous
    • No bound on delays, no bound on clock drift, etc.
  • Processes do not fail
  • Message delivery is reliable
    • Any message sent, is eventually delivered intact and exactly once
  • Client processes are well‐behaved and spent finite time accessing resources in the CS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mutual exclusion requirements CS

A
  • Safety
    • At most one process in the critical section at a time
  • Liveness
    • Requests to enter & exit CS eventually succeed
    • No deadlock
  • Fairness (order & starvation)
    • If one request to enter CS happened‐before another one, then entry to CS is granted in that order
    • Requests are ordered such that no process enters the critical section twice while another waits to enter (i.e., no starvation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Deadlock & starvation

A
  • Deadlock: Two or more processes become stuck indefinitely while attempting to enter and exit the CS, by virtue of their mutual dependency
  • Starvation: The indefinite postponement of entry to CS for a process that has requested it.
  • Can we order entry to the CS by the times processes requested it?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Performance metrics NN

A
  • Bandwidth: Number of messages sent, received or both
  • Synchronization delay: Time between one process exiting the critical section and the next entering
  • Client delay: Delay at entry and exit (response time)
  • We do not measure client access to resources protected by the critical section (assume finite)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Solution strategies Deadlocks

A
  • Centralized strategy
    • Divide processes into master and slaves, master dictates actions of slaves
  • Distributed strategy: Each process independently decides actions, based on local knowledge of others’ state
    • Token‐based: A node is allowed in the critical section (CS) if it has a token. Tokens are passed from site to site, in some (priority) order.
    • Non‐token‐based: A node enters CS when an assertion becomes true. A node communicates with other nodes to obtain their states and decides whether the assertion is true or false.
17
Q

Centralized strategy

A
  1. Elect a leader (details, cf. second part of this lecture)
    * Meets requirements: Safety, liveness, no starvation
    * Does solution meet the ordering requirement?
    * Advantages
    • Simple to implement
      * Disadvantages
    • Single point of failure
    • Bottleneck, network congestion, timeout
    • Deadlock potential for multiple resources with separate servers
      * Enter_CS()
    • Two messages: Request & Grant
    • Delays the requesting process by round trip for messages
      * Exit _CS()
    • One message: Release message
    • Assuming asynchronous message passing, incurs no delay
18
Q

Distributed strategy

A
  • In a distributed algorithm, the same decision must be made on each node, independent of the other nodes in the system.
  • Selected algorithms
    • Ring‐based algorithm
    • Logical clocks‐based algorithm (Lamport, 1976)
    • Ricart & Agrawala, 1981
    • Maekawa, 1985
    • Many more
19
Q

Ring‐based algorithm

A
  • Logical ring of processes
  • Each Pi knows its successor, P(i+1) mod N
  • Logical topology a priori unrelated to physical topology

Analysis:

  • Safe: Node enters CS only if it holds the token
  • Live: Since finite work is done by each node (can’t

re‐enter), the token eventually gets to each node

  • Fair: Ordering is based on ring topology
  • Performance
    • Constantly consumes network bandwidth, even when no processes seek entry, except when inside the CS

Synchronization delay: Between 1 and N messages – Client delay: Between 0 and N messages; 1 for exit

  • Problems
    • Lost token
    • Duplicated token
    • Failed node
20
Q

Ricart & Agrawala, 1981, algorithm

A
  • Basic idea
    • Processes wanting to enter CS, multicast a request to all processes
    • Enter CS, once all have granted request
  • Use Lamport timestamps… logical loch value to order requests: , t process identities T the timestamp, Pi the process identifier
  • Each process is in one of three states
    • ReleasedOutside the CS
    • WantedWanting to enter CS
    • HeldInside CS
  • If a process requests to enter CS and all other processes are in the Released state, entry is granted by each process
  • If a process, Pi, requests to enter CS and another process, Pk, is inside the CS (Held state), then Pk will not reply, until it is finished with the CS
21
Q

Ricart & Agrawala algorithm

A
22
Q

Fault‐tolerance aspects
of mutual exclusion algorithms

A
  • None of the algorithms tolerates message loss
  • Ring‐based algorithm cannot tolerate crash of single process
  • Centralized algorithm can tolerate crash of clients that are neither in the CS, nor have requested entry
  • Described R & A does not tolerate faults either
23
Q

Leader election

A
  • Problem: A group of processes, P1, …, Pn, must agree on some unique Pk to be the “leader
  • Often, the leader then coordinates another activity
  • Election runs when leader (a.k.a., coordinator) failed
  • Any process who hasn’t heard from the leader in some predefined time interval may call for an election
  • False alarm is a possibility (new election initiated, while current leader still alive)
  • Several processes may initiate elections concurrently
  • Algorithm should allow for process crash during election
24
Q

Process identifier

A
  • Elected leader must be unique
  • The one with the largest identifier
  • Identifier could be any “useful value” – I.e., unique & totally ordered
  • E.g., based on OS process identifiers, IP adr., port
  • E.g., based on least computational load
    • <1/load, i>, load > 0, i is process ID to break ties
  • Each process, Pi, has a variable electedi that holds
  • the value of the leader or is “┴” (undefined)
25
Q

Applications of leader election

A
  • Berkeley clock synchronization algorithm
  • Centralized mutual exclusion algorithm
  • Primary‐backup replication algorithms
  • Two‐phase commit protocol
  • Used in Amazon Dynamo KV‐store (replication)
  • Used in Google BigTable & GFS
26
Q

As compared to mutual exclusion

A
  • Losers return to what they were doing … … instead of waiting
  • Fast election is important …
    … not starvation avoidance
  • All processes must know the result… … not just the winner
27
Q

Election algorithm requirement

A
  • Safety
    • A participating process, Pi, has variable electedi = “┴” or electedi = P, where P is chosen as the non‐crashed process at the end of the election run with the largest identifier.
  • Liveness
    • All processes participate in the election and eventually either set electedi ≠ “┴” or crash.
  • Performance
    • Total number of messages sent (bandwidth) –…
28
Q

Ring‐based algorithm, 1978: Overview

A
  • Essentially three phases
    1. Initialization
    2. Election phase (concurrent calls for election)
      • Determine election winner (voting phase)
      • Reach point of message originator
    3. Announce the leader phase (victory announcement phase)
29
Q

Ring‐based election algorithm

A
  • Construct a ring (cf. ring‐based mutual exclusion)
  • Assume, each P has unique ID
  • Assume, no failures and asynchronous system
  • Any Pi may begin an election by sending an election message to its successor
  • Election message holds Pi’s ID
  • Upon receiving an election message, Pk compares its own
    • ID to ID in received message
    • If message ID is greater: Forward election message
    • If … smaller: Forward election message with Pk’s ID, unless Pk has already been participating in the current election run
    • If … equal: Pk is now leader. Forward victory message to notify all other processes
30
Q

Different cases - Ring-based algorithm

A
31
Q

Ring‐based algorithm: Example (Picture only)

A
  • Construct a ring (cf. ring‐based mutual exclusion)
  • Assume, each P has unique ID
  • Assume, no failures and asynchronous system
  • Any Pi may begin an election by sending an election message to its successor
  • Election message holds Pi’s ID
  • Upon receiving an election message, Pk compares its own
  • ID to ID in received message
    • If message ID is greater: Forward election message
    • If … smaller: Forward election message with Pk’s ID, unless Pk has already been participating in the current election run
    • If … equal: Pk is now leader. Forward victory message to notify all other processes
32
Q

Ring‐based election algorithm analysis

A
  • Worst case
  • Single process calls for election
  • Anti‐clockwise neighbour has highest identifier
    • N‐1 messages to reach this neighbour
    • N messages to reach point of origin
    • Leader announcement takes another N messages
  • For a grand total of 3N – 1 messages
33
Q

Bully algorithm, Garcia‐Molina, 1982

A
  • Assumes each process has a unique ID, reliable message delivery, and synchronous system with timeouts
  • Assumes processes know each others’ IDs and can communicate with one another
    • Higher IDs have priority
      • Can “bully” lower numbered processes
  • Initiated by any process that suspects failure of the leader
    • Employs timeouts to detect failure
    • Tolerates processes crashing during elections
34
Q

Bully algorithm messages

A
  • Operates with three types of messages
    • Election announces an election
    • Answer responds to an election message
    • Coordination announces the identity of leader
  • Algorithm is triggered by any process that detects (suspects) the leader to have crashed
35
Q

Pi detects failure of leader

A
  • For any j < i and any i < k
  1. Broadcasts election message
  2. Any Pk receiving election message responds with answer message and starts another election
  3. Any Pj receiving election message does not respond ( as
  4. If P does not receive any answer message (timeout) then it broadcasts victory via coordination message
  5. If Pi does receive answer message(s) then waits to receive coordinator message
  6. Restarts election, if no coordination message received
36
Q

Upon crash of a process BULLY ALGO

A
  • Start of a new process replacing crashed one with crashed one’s ID (e.g., read from disk)
  • Process may determine that it has the highest identifier, thus, pronounce itself as leader
    • Even though system may have an existing leader (elected during crash)
  • New process “bullies” current leader