Distributed Databases Flashcards

1
Q

What are the three key topics of DD?

A
  • Replication: Keep a copy of the same data on several different nodes
  • Partitioning: Split the database into smaller subsets and distribute the partitions to different nodes
  • Transactions: Units of work that groups several reads and writes to be performed together in the database (NOT EXAM MAT!!)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is Replication important?

A
  • Data scalability: increase read throughput by allowing several machines to serve read only requests
  • Geo-scalability: to have the data close to clients
  • Fault tolerance: to allow the system to work even if parts are down
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the roles of Replication nodes?

A
  • Leader: nodes that accept write queries from clients
  • Follower: nodes tha tprovide read-only access to data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the usual paradigms for implementing replication?

A
  • Single-leader: a single leader accepts writes, which are distributed to followers
  • Multi-leader: multiple leaders accept writes, keep themselves in syunc, and update follwers
  • Leaderless: all nodes are peers in the replication network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the main idea of Write Ahead Logs (WAL)?

A

WAL replication writes all changes to the leader and follower. Then, the followers apply the WAL entries to get consistent data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Who uses WAL?

A

PostgreSQL and Oracle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is logical based replication?

A

It generates a stream of logical updates for each update to the WAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some examples of Logical updates?

A
  • For new records, the insertion value
  • For deleted records, the ID
  • For updates records, the ID and updated value

Used by MongoDB and MySQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main problem with WAL?

A

It is bound to the implementation of the data strcutre. If it changes in the leader, it stops working

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does replication work when using Logical based replication?

A
  • Take a snapshot from leader
  • Ship it to replica
  • Get an ID to the state of the leader’s replication log at the time the snapshot was created
  • Initialize the replication function to the latest leader ID
  • Retrieve and apply the replication log until it catches up
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main point of Synchronous replication?

A

The writes need to be confirmed by a configurable number of followers before the leader reports success.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the main point of Asynchronous replication?

A

The leader reports success as soon as the write was confirmed to disk, followers apply their own changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name some characteristics of Synchronous replication.

A
  • A follower is guaranteed to have up to date information with the leader
  • The data is available even if the leader fails, on the followers
  • If not enough followers respond, the operation cannot be processed
  • All writes are blocked by the leader until enough follower writes are confirmed
  • Impractical in real life
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name some characteristics of Asynchronous replication

A
  • It has higher availability since writes are not blocked as much
  • A follower is never guaranteed to have up to date copy
  • Writes are not guaranteed to be durable in case of leader failure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe Synchronous replication as it relates to consistency and availability.

A

SR is very consistent, but not so available since it blocks writes until the current one is reported a success.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe Asynchronous replication as it relates to consistency and availability.

A

AR is not so consistent, but it is widely available, since it does not block writes for long.

15
Q

What is the CAP Theorem

A

CAP comes from Consistency, Availability, and Partition Tolerance. The theorem states that there is no real life system that can have all three, and thus one has to choose two of these attributes when designing a system.

16
Q

What is eventual consistency?

A

All updates eventually deliver to all replicas.

17
Q

What is causal consistency?

A

Causally related operations are delivered to other replicas inthe correct order. Thus, it maintains a partial order of events based on causality.

18
Q

What is client-centric consistency?

A

A system that guarantees the correct order of operations only for a single client process

19
Q

What is sequential consistency?

A

The operations appear to take place in a total order that is consistent with the order of operations on each replica. All replicas observe the same order of operations.

20
Q

What is linearizability?

A

Sequential consistency + the total order or operation conforms to the real time ordering. Once a write is completed, the value is replicated to all nodes. At any point in time, all nodes read the same value.