{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

Foundational Distributed System Concepts -- Availability and Reliability Flashcards

Study concepts (38 cards)

1
Q

What are the types of replication?

A

Master-slave, multi-master, quorum-based, leaderless

These replication types refer to how data is copied and synchronized across different nodes in a distributed system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between synchronous and asynchronous replication?

A

Synchronous replication requires data to be written to all nodes at once; asynchronous allows for delays in data propagation

Synchronous replication ensures immediate consistency, whereas asynchronous may lead to temporary inconsistencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the CAP Theorem stand for?

A

Consistency, Availability, Partition tolerance

The CAP Theorem states that a distributed data store can only guarantee two of the three properties at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the implications of the CAP Theorem?

A

Trade-offs between consistency, availability, and partition tolerance in distributed systems

Understanding the CAP Theorem helps in designing systems based on their requirements for consistency, availability, or tolerance to network partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give an example of a system that prioritizes consistency.

A

ACID databases

ACID (Atomicity, Consistency, Isolation, Durability) databases focus on ensuring reliable transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give an example of a system that prioritizes availability.

A

Eventually consistent NoSQL databases

These databases prioritize being available for reads and writes, even if they are not immediately consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is fault tolerance?

A

The ability of a system to continue operating in the event of a failure

Fault tolerance is crucial for maintaining service availability and reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List some strategies for achieving fault tolerance.

A
  • Redundancy
  • Failover
  • Self-healing
  • Circuit breakers
  • Retries

These strategies help systems handle failures gracefully and maintain operational stability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is data replication fundamental in distributed systems? What problems does it solve?

A

High Availability (HA): If one node fails, others can continue serving requests.
Fault Tolerance: The system can withstand node or network failures.
Read Scalability: Distributing read requests across multiple replicas.
Lower Latency: Placing data geographically closer to users.
Disaster Recovery: Protecting data against regional outages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What dictates the the selection of a replication strategy?

A

CAP theorem. You generally have to choose two out of three, or more practically, decide on the degree of each.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe how Leader-Follower Replication (Primary-Replica/Master-Slave) works

A

Most common and often simplest. How it works:
1. Writes: Clients send all write requests to the leader.
2. Replication Log: The leader records changes (e.g., in a write-ahead log, binary log, or sequence of operations).
3. Propagation: The leader sends this log of changes to its followers.
4. Application: Followers apply these changes in the same order as the leader.
5. Reads: Reads can be served by the leader or any of the followers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the types of leader-follower replications?

A

Asynchronous, Synchronous and Semi-Synchronous Replications.
- Asynchronous Replication: The leader writes to its local storage, acknowledges the client, and then replicates changes to followers in the background
- Synchronous Replication: The leader waits for at least one (or a configured number) of followers to acknowledge receipt of the write before confirming success to the client.
- Semi-Synchronous Replication: A hybrid approach. The leader waits for at least one follower to acknowledge receipt (but not necessarily commit) of the write, and then acknowledges the client. It provides a good balance between performance and durability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the Pros, Cons, and Use Cases of Asynchronous Leader-Follower Replication?

A

Pros: Low write latency for the client, high write throughput for the leader.
Cons: Potential for data loss if the leader crashes before changes are replicated to followers (RPO > 0). Reads from followers might be stale (eventual consistency).
Use Cases: Most common, suitable for many web applications where slight staleness is acceptable (e.g., social media feeds, most e-commerce product catalogs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the Pros, Cons, and Use Cases of Synchronous Leader-Follower Replication?

A

Pros: Stronger consistency guarantees (lower RPO, can guarantee linearizability if reads go to the leader and a quorum is used). Less chance of data loss.
Cons: Higher write latency, reduced write throughput (leader blocks until acknowledgment). If a follower fails, it can block writes.
Use Cases: Financial transactions, critical data where data loss is unacceptable, systems requiring strong consistency (e.g., databases for banking ledgers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the Pros of Leader-Follower Replication?

A

Simplicity: Easier to reason about consistency because there’s a single source of truth for writes.
Strong Consistency (with synchronous replication/leader reads): By directing all reads to the leader, or by using synchronous replication, strong consistency can be achieved.
Read Scalability: Easy to scale reads by adding more followers.
Good for Read-Heavy Workloads: Can offload read traffic from the leader.
Conflict Avoidance: No write conflicts, as only the leader writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the Cons of Leader-Follower Replication?

A

Single Point of Failure for Writes (Leader): If the leader fails, writes are blocked until a new leader is elected. This involves leader election, which itself often uses a consensus algorithm (like Raft or Paxos) to ensure all nodes agree on the new leader.
Replication Lag: Asynchronous replication introduces a delay (lag) between the leader and followers, leading to stale reads if clients read from followers.
Leader Bottleneck: All writes must go through the leader, which can become a bottleneck for write-heavy workloads or very high write throughput requirements.
Failover Complexity: Manual or automatic failover mechanisms are required to promote a new leader, which adds operational complexity and can lead to downtime during election.

17
Q

What are the Tradeoffs of Leader-Follower Replication?

A

Consistency vs. Availability: Synchronous offers higher consistency but lower availability/latency; asynchronous offers higher availability/latency but lower consistency.
Read Scalability vs. Write Throughput: Excellent read scalability, but write throughput is limited by the single leader.

18
Q

When do you use Leader-Follower Replication?

A

Most common for traditional RDBMS (MySQL, PostgreSQL, Oracle Data Guard).
Read-heavy workloads where strong consistency is desired for writes but eventual consistency for reads is acceptable (or reads from leader for stronger consistency).
Applications with clear transactional boundaries.
Simpler systems where operational complexity needs to be minimized.

19
Q

Describe how Mutli-Master (Active-Active) Replication works

A

Writes: Clients can send writes to any of the designated master nodes.
Inter-Master Replication: Each master replicates its changes to all other masters. This can be synchronous or asynchronous.
Conflict Resolution: This is the most significant challenge. If the same data is modified concurrently on different masters, conflicts arise and must be resolved.

20
Q

What are the conflict resolution strategies used in Multi-Master Replication?

A

Last Write Wins (LWW): The write with the most recent timestamp (or version number) wins. Simple but can lead to data loss.
Merge Operations: For certain data types (e.g., sets, lists), changes can be merged.
Application-Specific Logic: The application provides custom logic to resolve conflicts.
Conflict Avoidance: Design the system to ensure that different masters rarely (or never) write to the exact same data items simultaneously (e.g., partition data by geographic region, user ID). This is often the most practical approach.

21
Q

What are the Pros of Multi-Master Replication?

A

High Write Availability: No single point of failure for writes; if one master goes down, others can continue.
Low Write Latency (Geo-distributed): Clients can write to the nearest master, reducing latency in global deployments.
Improved Write Throughput: Writes can be distributed across multiple masters.
Disaster Recovery: If an entire data center fails, other data centers can continue operating.

22
Q

What are the Cons of Multi-Master Replication?

A

Complex Conflict Resolution: The biggest challenge. Designing, implementing, and debugging conflict resolution logic is hard and error-prone.
Data Inconsistency Risk: Unless strong synchronous replication and coordination are used (which negates many benefits), there’s a higher risk of temporary data inconsistencies due to concurrent writes and replication lag.
Increased Operational Complexity: More difficult to set up, monitor, and troubleshoot than single-leader systems.
Circular Replication Problems: In some topologies, changes can loop back or cause endless replication.

23
Q

What are the Tradeoffs in Multi-Master Replication?

A

Consistency vs. Availability/Performance: Prioritizes availability and potentially write performance over strong consistency. Conflict resolution is often a step towards eventual consistency.
Complexity: Significantly higher complexity in design and operation.

24
Q

When do you use Multi-Master Replication?

A

Global applications requiring low write latency for users in different geographical regions (e.g., online collaborative editing, distributed social media where users mostly modify their own data).
When high write availability across multiple sites is paramount, and the application can tolerate or effectively handle eventual consistency and conflicts.
Not ideal for systems requiring strict ACID transactions across distributed masters without significant additional coordination (like distributed transactions, which add complexity and latency).

25
What is Quorum-Based Replication (often part of leaderless or multi-master)?
It is a NOT a replication strategy but a consistency mechanism often employed within leaderless or sometimes multi-master systems to guarantee a level of consistency and fault tolerance.
26
What are the Key Concepts and Consistency Guarantees with Quorums?
Key Concepts: N: Total number of replicas. W (Write Quorum): Minimum number of replicas that must acknowledge a write operation for it to be considered successful. R (Read Quorum): Minimum number of replicas that must respond to a read request. Consistency Guarantees with Quorums: To guarantee strong consistency (e.g., read-your-writes, linearizability), the following condition must hold: W+R>N This condition ensures that there's always at least one overlapping replica between the write quorum and the read quorum, meaning any read will "see" the most recent write.
27
How does Quorum-based replication work (typical in leaderless systems)?
Writes: A client sends a write request to multiple nodes. The write is successful if W nodes acknowledge it. Reads: A client sends a read request to multiple nodes. It collects responses from R nodes and then typically returns the most recent version (often determined by a timestamp or version vector). If older versions are found, a "read repair" mechanism might update the stale replicas in the background.
28
What are the Pros of Quorum-based replication?
Tunable Consistency: By adjusting W and R, you can tune the balance between consistency, availability, and performance. Strong Consistency: If W+R>N, you achieve strong consistency. Eventual Consistency: If W+R≤N, you get eventual consistency. For example, if W=1 and R=1, you prioritize availability and low latency, but data can be highly inconsistent. High Availability: The system can tolerate up to N−W node failures for writes and N−R node failures for reads while maintaining availability. Fault Tolerance: No single point of failure. Scalability: Can scale horizontally by adding more replicas.
29
What are the Cons of Quorum-based replication?
Increased Latency: Operations (especially reads and writes) can incur higher latency as they need to coordinate with multiple nodes. Complexity: Implementing quorum logic, conflict resolution (if not strongly consistent), and read repair can be complex. Consistency Tradeoffs: While tunable, achieving strong consistency (e.g., W=N,R=1 or W=majority,R=majority) can reduce availability and performance compared to weaker consistency models. Conflict Resolution (if W+R≤N): If the quorum condition isn't met for strong consistency, conflicts are possible and must be resolved.
30
What are the tradeoffs in Quorum-based replication?
Consistency vs. Performance/Availability: Direct and explicit control over this tradeoff by adjusting quorum sizes. Complexity: Higher implementation and operational complexity than simple asynchronous leader-follower.
31
What is Leaderless Replication?
In leaderless replication (also known as "Dynamo-style replication" after Amazon's Dynamo), there is no designated leader node. Any replica can accept read and write requests directly from clients. Consistency is typically managed using quorum mechanisms and various conflict resolution techniques.
32
When do you use quorum-based replication?
Databases like Cassandra, DynamoDB, Riak (which are leaderless systems). When fine-grained control over consistency guarantees is required. Highly available systems that can tolerate eventual consistency or where strong consistency is only required for critical paths. Systems where individual node failures are common and rapid recovery without leader election overhead is desired.
33
How does Leaderless Replication work?
Writes: A client sends a write request to a coordinator node (which is often just the node the client connected to, and acts as a proxy). The coordinator then forwards the write to N replicas (where N is the replication factor). The write is considered successful if W of these replicas acknowledge the write. Reads: A client sends a read request to a coordinator node. The coordinator forwards the read to N replicas, collects responses from R replicas, resolves any conflicts (e.g., using version vectors, timestamps, or LWW), and returns the most recent version to the client. Read Repair: If a read detects inconsistencies (e.g., a replica returning stale data), the coordinator will silently update the stale replicas in the background. Hinted Handoff: If a replica is temporarily unavailable, the coordinator might send the write to another healthy replica (a "hinted handoff") which will then deliver the write to the original replica once it comes back online.
34
What are the Pros of Leaderless Replication?
Extreme High Availability: No single point of failure or bottleneck. The system can continue operating as long as W nodes for writes or R nodes for reads are available. Excellent Fault Tolerance: Very resilient to node failures and network partitions. Horizontal Scalability: Easy to scale horizontally by adding more nodes. Low Latency (for client if W and R are small): Clients can write to or read from the nearest available node, potentially minimizing latency, especially if W and R are set to less than N.
35
What are the Cons of Leaderless Replication?
Eventual Consistency is Common: While strong consistency can be achieved with W+R>N, it often comes at the cost of higher latency. Most leaderless systems default to eventual consistency for performance and availability. Complex Consistency Reasoning: Understanding the state of the system and debugging inconsistencies can be challenging for developers. Conflict Resolution: Requires sophisticated conflict resolution mechanisms (version vectors, LWW, application-level resolution) which add complexity. Read Repair Overhead: Background read repairs can add load to the cluster. No Global Order: Without a single leader, it's harder to establish a global, linearizable order of operations, which can be an issue for certain types of applications (e.g., strict financial transactions requiring total ordering).
36
What are the tradeoffs in Leaderless Replication?
Consistency vs. Availability/Performance: Heavily biased towards Availability and Partition Tolerance over strong consistency. It's an AP system in CAP theorem. Operational Complexity: Can be complex to operate and monitor, especially for consistency guarantees.
37
When to use Leaderless Replication?
Massively scalable, highly available systems where eventual consistency is acceptable. Use cases like product catalogs, user profiles, shopping carts, time-series data, or IoT data. Systems that require writes to always be available, even during network partitions. When you need to handle extremely high read and write throughput across many nodes. Examples: Cassandra, DynamoDB, Riak.
38
Summarize Leader-Follower (sync and async), multi-master and leaderless replications
CAP Theorem: How each strategy positions itself on the consistency-availability spectrum. Leader-Follower (Synchronous): Favors C and P. Leader-Follower (Asynchronous): Favors A and P, with eventual consistency. Multi-Master: Favors A and P, with eventual consistency and conflict resolution. Leaderless (Quorum): Favors A and P, with tunable consistency (can be CP or AP).