Paxos, gorums Flashcards
(30 cards)
Q: What is Gorums and what problem does it solve?
A:
Framework for building fault-tolerant distributed systems in Go
Key features:
Quorum-based communication
Built on gRPC
Code generation from protobuf
Failure handling built-in
Solves problems of:
Manual quorum handling
Network communication complexity
Distributed error handling
Type-safe RPC generation
Simple Explanation:
Gorums makes it easier to build systems that need agreement from multiple servers. Instead of manually counting votes or handling network issues, Gorums does this for you. Think of it as a smart messenger that knows how many responses you need before proceeding.
Q: How does Gorums implement quorum specifications (QSpec)?
A:
Core concept: QSpec interface defines how to process responses
Two main components:
Quorum size (e.g., majority: n/2 + 1)
Quorum functions (QF) that process replies
Example from Paxos:
type PaxosQSpec struct {
quorum int // (n/2 + 1)
}
func (qs PaxosQSpec) PrepareQF(prepare PrepareMsg, replies map[uint32]PromiseMsg) (*PromiseMsg, bool)
A QSpec is like a vote counter with rules. You tell it how many votes you need (quorum) and what makes a vote valid (quorum function). When responses come in, it applies these rules automatically.
Q: How does Gorums handle shared memory versus message passing?
A:
Gorums uses message passing, not shared memory:
Communication model:
All state transfer via gRPC messages
No direct memory sharing between nodes
Each node maintains its own state copy
Consistency through:
Quorum-based voting
State replication via messages
Explicit state transfer protocols
Benefits:
Better fault isolation
No memory synchronization issues
Cleaner separation between nodes
Scales across network boundaries
Simple Explanation:
Instead of sharing memory directly, nodes in Gorums talk by sending messages. Each node keeps its own copy of data, and they stay in sync by voting on changes. This is safer than sharing memory directly, especially when nodes might crash or network might fail.
Q: How does Gorums integrate with gRPC?
A:
Built on top of gRPC with extensions:
Components:
Protobuf service definitions
Generated Gorums-specific code
Custom RPC patterns for quorums
Example configuration:
mgr := NewManager(
gorums.WithGrpcDialOptions(
grpc.WithTransportCredentials(insecure.NewCredentials()),
))
config, err := mgr.NewConfiguration(qspec, gorums.WithNodeMap(nodeMap))
Gorums uses gRPC’s reliable communication but adds special features for group communication. It’s like having gRPC’s reliable phone calls but with the ability to conference call and count votes from multiple participants.
Q: How does Gorums handle failure detection and timeouts?
A:
Built-in failure handling mechanisms:
Timeout handling:
Context-based timeouts
Configurable deadline per call
Automatic cleanup of timed-out calls
Failure detection:
Node health monitoring
Automatic removal of failed nodes
Reconfiguration support
*
Gorums watches for problems like slow or crashed servers. It sets time limits for responses and automatically handles cases where servers don’t respond in time. This helps your system keep running even when some parts fail.
Q: What are the main types of shared memory access in the Multi-Paxos implementation?
Two primary mechanisms:
Protected Shared State (Mutex-based):
Replica state (leader status, configuration)
Paxos component state (proposer, acceptor, learner)
Client handler response mappings
Communication Channels:
Leader change notifications
Client value queue
Response channels for clients
Simple Explanation:
The system uses both traditional locked shared memory (like a locked diary multiple people need to write in) and channels (like passing messages through a pipe). Locks protect shared state that needs updating, while channels handle communication between different parts.
“Mutexes lock the ‘current leader’ variable during updates, while channels notify components (e.g., ‘Leader changed! Update your state!’).”
Q: How does Multi-Paxos handle shared state in its core components?
A:
Each core component has protected state:
Proposer:
Apply to multipaxos.p…
}
Acceptor:
Apply to multipaxos.p…
}
Learner:
Apply to multipaxos.p…
}
When Used:
During phase transitions
Value updates
State queries
Simple Explanation:
Each Paxos role (Proposer, Acceptor, Learner) keeps its own state safe using locks. It’s like having a special key for each diary - only one person can write at a time, preventing confusion.
Q: How does the system handle leader changes with shared memory?
A:
Three-part mechanism:
Subscription Channel:
leaderChangeChannel := r.leaderDetector.Subscribe()
State Protection:
func (r *Replica) setLeader(isLeader bool) {
r.mu.Lock()
r.isLeaderFlag = isLeader
r.mu.Unlock()
}
Event Processing:
case newLeaderID := <-leaderChangeChannel:
r.handleLeaderChange(ctx, newLeaderID)
Simple Explanation:
Leader changes use both types of shared memory: channels to notify about changes, and protected state to safely update who’s leader. It’s like having both an announcement system (channels) and a protected whiteboard (mutex) for tracking leadership.
Q: How does client request handling use shared memory?
Multi-layered approach:
Value Queue:
clientValueQueue: make(chan *pb.Value, multipaxos.Alpha)
Response Tracking:
type clientHandler struct {
mu sync.Mutex
responseChanMap map[uint64]chan *pb.Response
}
Used when:
Receiving client requests
Processing responses
Leader changes
Request completion
Simple Explanation:
Client handling uses both queues (channels) for incoming requests and protected maps for tracking responses. It’s like having both a ticket system (queue) and a protected ledger (mutex-protected map) for handling customer requests.
Q: Why does Multi-Paxos use both mutexes and channels for shared memory?
A:
Different needs require different solutions:
Mutexes Used For:
Protecting critical state
Ensuring atomic updates
Preventing race conditions
Channels Used For:
Asynchronous communication
Event notification
Request queuing
Clean shutdown handling
Benefits:
Better separation of concerns
Natural Go concurrency patterns
Clearer communication flows
Easier deadlock prevention
Simple Explanation:
Mutexes and channels serve different purposes - mutexes protect shared data (like a lock on a diary), while channels handle communication (like a message pipeline). Using both gives the best of both worlds: safe data access and clean communication.
Q: What triggers round number (ballot number) advancement in Paxos?
A:
Two main triggers:
New Leader Election:
Higher-numbered proposer believes it’s the new leader
Sends prepare message with higher round number
Failure Scenarios:
Actual crash of old proposer
False suspicion of crash by other nodes
Simple Explanation:
Round numbers increase when leadership changes, either due to real failures or when nodes suspect (correctly or incorrectly) that the leader has failed.
Q: What does multiple active proposers (leaders) tell us about system synchrony?
A:
Key Implications:
Asynchronous Behavior:
Multiple leaders indicate system asynchrony
At least one false suspicion exists
Network or timing issues present
Normal Operation:
Should have only one leader
Non-leaders only become leaders when suspecting default leader
Multiple leaders = failure detection uncertainty
Simple Explanation:
Having multiple leaders means the system isn’t behaving in a perfectly timed (synchronous) way. It’s like having two conductors leading an orchestra - it happens when one conductor can’t see that the other is still active.
Q: What is the equation for fault tolerance in Paxos and why?
A:
Equation: n = 2f + 1
where:
n = total number of replicas
f = number of failures to mask
Root Causes:
Need majority for progress
Can’t distinguish between:
Actually crashed replicas
Slow replicas
Network partitioned replicas
Why This Number:
Must prevent minority from making progress
Need majority quorum for both:
Promise messages
Learn messages
Ensures safety when partitioned
Simple Explanation:
We need more than half the replicas (majority) to be working to make decisions. This prevents split-brain scenarios where different groups might make conflicting decisions.
Q: Can Paxos enter an infinite loop?
A:
Not in the algorithm itself, but in practice yes:
Causes of Infinite Loops:
Leader Competition:
Two proposers repeatedly compete
Neither gets enough time to complete
Failure Scenarios:
More than half nodes failed
Leader keeps starting new rounds
Never achieves majority
Failure Detector Interaction:
Continuous false suspicions
Repeated leader changes
Simple Explanation:
While Paxos itself can’t loop forever, the interaction with failure detection and leader election can cause continuous round changes without progress. It’s like having two people repeatedly interrupting each other before either can finish speaking.
Q: Can an acceptor accept different values in different rounds?
A:
Yes, but with important constraints:
Conditions:
Can accept new value in higher round
Only if leader didn’t see previous promise
Majority acceptance ensures value preservation
Safety Guarantee:
If value V accepted by majority in round R
All future rounds will propose V
Ensures consistency
Simple Explanation:
An acceptor can change its mind in a higher round, but the protocol ensures that once enough acceptors agree on a value, that value sticks - future rounds will maintain it.
Q: What is the difference between Single-Decree Paxos and Multi-Paxos?
Key Differences:
Phase 1 Execution:
Single-Decree: Runs Phase 1 for every value
Multi-Paxos: Runs Phase 1 only on leader change
Slot Management:
Single-Decree: One instance for one value
Multi-Paxos: Multiple slots for sequence of values
Leader can propose multiple values after single Phase 1
Promise Messages:
Single-Decree: Contains single value
Multi-Paxos: Contains array of accepted values for different slots
Simple Explanation:
Multi-Paxos is like having a long-term chairman (leader) who can propose multiple items without re-election, while Single-Decree Paxos requires a new election for each decision.
Q: How does Multi-Paxos handle client requests?
A:
Three-Layer Architecture:
Client Handler:
type clientHandler struct {
id uint32
active bool // Leader status
responseChanMap map[uint64]chan *pb.Response
}
Request Flow:
Client sends request to any replica
Only leader processes requests
Non-leaders forward to leader
Leader assigns slot number
Runs Accept phase
Response Handling:
Wait for Learn messages
Notify client when decided
Handle timeouts
Simple Explanation:
Like a restaurant where any waiter can take your order, but only the head chef (leader) decides when and how to prepare it. Other waiters forward orders to the head chef.
Q: What role does the Failure Detector play in Multi-Paxos?
Key Components:
Leader Detection:
Monitors node health
Triggers leader changes
Uses heartbeat mechanism
Integration:
type Replica struct {
failureDetector gorumsfd.FailureDetector
leaderDetector *leaderdetector.MonLeaderDetector
}
Actions on Failure:
Suspect current leader
Trigger new leader election
Start Phase 1 if becoming leader
Block client requests during transition
Simple Explanation:
The Failure Detector is like a health monitoring system that watches all nodes. If it suspects the leader is down, it triggers the election of a new leader to keep the system running.
Q: How does Multi-Paxos ensure consistency across replicas?
A:
Multiple Mechanisms:
Slot Ordering:
Sequential slot numbers
Leader assigns slots
Replicas process in order
Value Selection:
func (qs PaxosQSpec) PrepareQF(…) {
// Find highest round number for each slot
// Fill gaps with no-op values
// Ensure consistent ordering
}
State Synchronization:
New leader learns previous decisions
Re-proposes uncommitted values
Fills gaps with no-ops
Simple Explanation:
Like maintaining a shared ledger where pages (slots) must be filled in order, and new leaders must first understand what’s already written before adding new entries.
“If slot 5 is empty after a leader change, the new leader proposes a no-op for slot 5 before handling slot 6, preserving order.”
Q: How does Multi-Paxos handle concurrent client requests?
Request Queuing:
clientValueQueue: make(chan *pb.Value, multipaxos.Alpha)
Slot Assignment:
Leader assigns sequential slots
Can batch multiple requests
Maintains ordering guarantees
Performance Optimizations:
Pipelining of requests
Batch processing
Concurrent Phase 2 instances
Simple Explanation:
Like a ticket system where requests are queued and processed in batches, but maintaining a strict order. The leader can work on multiple requests simultaneously but ensures they’re completed in sequence
Q: What happens during a leader change in Multi-Paxos?
Detection and Notification:
leaderChangeChannel := r.leaderDetector.Subscribe()
case newLeaderID := <-leaderChangeChannel:
r.handleLeaderChange(ctx, newLeaderID)
New Leader Actions:
Stop accepting client requests
Run Phase 1 for all slots
Learn uncommitted values
Resume client handling
Other Replica Actions:
Update leader reference
Forward pending requests
Clear old state
Simple Explanation:
Like changing shift managers - the new leader must first understand all pending work (Phase 1), decide what needs to be redone, and then resume normal operations. Other workers need to recognize the new manager.
Q: What is Paxos and its primary use case?
A: Paxos is a consensus algorithm for agreeing on a single value among distributed processes. It’s used in replicated state machines to order client commands consistently across servers, ensuring fault-tolerant and highly available systems.
Q: What are the three consensus safety properties in Paxos?
Only proposed values can be chosen
Exactly one value is chosen Processes learn chosen values only after they're committed
Q: What optimization does Paxos employ with a stable leader? (Multipaxos)
A: The prepare phase (Phase 1) is skipped. The leader directly sends accept messages using its existing leadership authority, reducing latency by 1 round trip time.