Consensus (L9) Flashcards
(23 cards)
What is the main goal of Lecture 9: Consensus?
To study the importance of consensus in distributed systems and how to deal with conflict resolution in replicas.
What is Consensus?
It is a fundamental problem requiring a set of processes to agree on a value in a fault-tolerant way. It defines how a distributed system reaches agreement to enforce behaviors defined in consistency models.
What are the three properties that a value agreed upon through consensus must satisfy?
- Every non-faulty process eventually agrees on a value.
- The final decision of every non-faulty process is the same everywhere.
- The value that has been agreed on has been proposed by a process.
Name some practical applications of consensus.
- Commit transactions.
- Decisions in general (votes are involved).
- Hold a lock (Mutual exclusion).
- Failure detection (Correctness in the presence of Byzantine nodes).
What is State Machine Replication (SMR)?
A technique where every replica acts as a state machine. The leader broadcasts operations that change its state to followers, ensuring all replicas start in a fixed initial state and go through the same sequence of state transitions in the same order. It’s a foundation for highly available, strongly consistent systems.
What is Total Order Broadcast in the context of SMR?
Every node delivers the same messages in the same order. This ensures that followers execute the same sequence of operations as the leader.
What is the FLP result (Fisher, Lynch, Paterson)?
It states that there is no deterministic consensus algorithm that is guaranteed to terminate in an asynchronous crash-stop system model.
How do algorithms like Paxos and Raft handle the FLP impossibility result?
They assume a partially synchronous crash-recovery model and use clocks only for timeouts/failure detectors to ensure progress, while safety (correctness) does not depend on timing.
What are the conditions to be followed to achieve distributed consensus?
Termination, Agreement, Validity, and Integrity.
How is a leader elected and maintained in some consensus algorithms?
- A failure detector (timeout) is used to suspect a crashed or unavailable leader.
- On suspected leader crash, a new leader is elected.
- Terms (logical clocks) are incremented for new elections, and a node can only vote once per term.
- A quorum of nodes is required to elect a leader in a term.
Can there be multiple leaders from different terms simultaneously? How is this handled?
Yes, a leader from an older term might not know a new leader has been elected in a newer term due to network partitions. To ensure correctness, for every decision, the current leader must get acknowledgements from a quorum in its current term.
What is Paxos?
A consensus algorithm that ensures a group of nodes can agree on a single value, even with failures. It tolerates up to f failures with 2f + 1 nodes.
What are the three main roles in Paxos?
Proposers, Acceptors, and Learners. Nodes can play multiple roles.
Briefly describe the two phases of Paxos.
- Phase 1 (Prepare/Promise): A proposer sends a PREPARE message with an ID to acceptors. Acceptors promise not to accept proposals with a lower ID and may respond with a previously accepted value.
- Phase 2 (Accept/Acknowledge): If the proposer receives promises from a majority, it sends an ACCEPT-REQUEST with the ID and value. Acceptors accept it if they haven’t promised a higher ID, then send an ACCEPT message to learners.
In Paxos, if an acceptor has already accepted a value, how does it respond to a new PREPARE message?
If the new PREPARE ID (IDp) is higher than any it has promised to ignore, it will reply with PROMISE IDp, and include the previously accepted ID (IDa) and VALUE.
In Paxos, if a proposer receives promises that include already accepted values, which value does it choose for its ACCEPT-REQUEST?
It picks the value associated with the highest accepted ID (IDa) from the promises received. If no values were accepted, it can pick its own value.
What is Raft?
A modern consensus algorithm designed to be more understandable than Paxos. It is based on state machine replication and divides time into election terms.
What are the three states a node can be in Raft?
Follower, Candidate, and Leader.
Briefly describe the leader election process in Raft.
- Nodes start as Followers.
- If a Follower doesn’t receive heartbeats from a Leader, it times out and becomes a Candidate.
- The Candidate increments the term, votes for itself, and requests votes from others.
- If it gets a majority of votes, it becomes Leader. If another process wins or a new term is discovered, it reverts to Follower.
How does log replication work in Raft?
- The Leader is the only one that can make changes to replica states.
- When the Leader applies an operation, it appends a new log entry to its own log.
- The Leader then replicates this log entry to the Followers (e.g., via AppendEntries messages).
- An entry is committed once a majority of followers acknowledge it.
What is Chain Replication?
A replication protocol where processes are arranged in a chain. The leftmost process is the head, and the rightmost is the tail. Writes go to the head and propagate down the chain; reads are typically served by the tail.
How does Chain Replication handle failures?
A control panel component monitors the chain’s health and implements consensus to manage the chain if a node fails. The chain can tolerate N-1 process failures (where N is chain length), while the control panel itself can tolerate C/2 failures (where C is the number of control panel replicas).
What is the key difference between Consensus and Atomic Commit?
Every node votes to commit or abort. Must commit if all vote to commit; must abort if at least one votes to abort or if a participating node crashes. Consensus algorithms can be used with atomic commit protocols to ensure all nodes agree on the commit/rollback decision.