Chapter 5 - Replication Flashcards

(40 cards)

1
Q

What is replication?

A

Keeping the same data on multiple machines that are connected via a network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the reasons one may want to replicate data?

A
  • Reduce latency by keeping data geograpically close to users
  • Increase availablility as the system can continue to work even if some parts fail
  • Increase read throughput by scaling out the number of machines that can serve read queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Whate the three main approaches to replicating changes between nodes?

A
  • Single leader
  • Multi-leader
  • Leaderless replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a replica?

A

A node/server that stores a copy of the database.
Every write needs to be processed by ever replica, otherwise the replicas no longer contain the same data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is leader-based replication?

A
  • One replica is designated the leader
  • All client write queries go to the leader
  • The other replicas are followers
  • When the leader writes new data to its local storage it also sends the data change to its followers as part of he change stream
  • Clients can read from any replica
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are synchronous and asynchronous replication?

A

Synchronous: The leader waits for the follower to confirm it has recieved the write before reporting success to the user
Asynchronous: The leader sens the message to the follower replica but does not wait for a response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the disadvantages of synchronous replicatoin?

A
  • Synchronous replication may slow down the entire system if the follower is recovering from a failure, the system is near capacity or there are networking problems
  • Impractical for all followers to be synchronous, any node outage would cause the system to grind to a hault
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages of synchronous replicatoin?

A
  • Synchronous replication gaurantees the follower has an up-to-date copy of the data consistent with the leader
  • One synchronous follower can be upgraded to leader if leader fails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the advantages of asynchronous replication?

A
  • The leader can continue to process writes even if all followers are down
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we add new followers in leader-based replication?

A
  • Take a consistent snapshot of the leaders database without taking a lock on the database (most DBs have this feature)
  • Copy snapshot to follower node
  • Follower requests all data changes that have happened since the snapshot was taken
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we handle node outages for followers in leader-based replication?

A
  • Once the follower has restarted checked the log for latest processed transaction
  • Follower can request all the data changes that occurred since then
  • Can continue recieving a stream of data changes as before
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we handle node outages for leader in leader-based replication?

A
  • Controller node appoints new leader (may be the load balancer?)
  • No easy way to decide how to recover unreplicated writes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is statement-based replication?

A
  • Leader logs every write request, a statement, that it executes
  • Leader sends that statement log to its followers
  • For relational databases this means every literal SQL statement (INSERT, DELETE, UPDATE) is forwarded to followers
  • The followers parse and execute the statement as if it has been recieved from a client
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the potential pitfalls of statemened-based replication?

A
  • Statements that call non-deterministic functions, NOW() or RAND() would generate a different value on each replica
  • If statements use autoincrementing columns or depend on existing data they must be executed in the EXACT same order on each replica
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is write-ahead log shipping?

A
  • For both log-structured storage engines and B-trees, an append-only log is stored on disk
  • The leader sends the log to followers and uses it to build a copy of the exact same data structures found on the leader
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the disadvantages of write-ahead log shipping?

A
  • Write ahead log contains details of which bytes were changes in which disk blocks
  • Closely coupled to the storage engine
  • Not possible to run different versions of the database software on the leaders and followers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is logical (rows-based) log replication?

A
  • Different log formats for replication and for the storage engine
  • Logical log is a sequence of records describing the writes to database tables at the granularity of a row
  • Allows different nodes to run different database engines
18
Q

What is trigged-based replication?

A

Lets you register custom application code that is automatically executed when a data change (write transaction) occurs in a database system.
This custom application code or external process can then replicate the data change to another system

19
Q

What is read-after-write or read-your-write consistency?

A
  • A guarantee that if a user writes a change to the database they will always see any updates they submitted themselves
  • Also need to consider cross device read-after-write consistency
  • Can be implemented by forwarding the reads of a user that has recently written to the leader or a sufficiently updated follower
20
Q

What are monotonic reads?

A

A guarantee that if a user makes several reads in sequence, they won’t read older data after having previously read newer data
e.g read from a follower and get 2 comments, then read from another follower with more lag and only get 1st comment
Can be implemented by making sure users always read from the same replica

21
Q

What are consistent prefix reads?

A

A guarantee that if a sequence of writes happens in a certain order, anyone reading those writes will see them appear in the same order
If the database always applied writes in the same order, reads always see a consistent prefix – this is more of a problem for partitioned databases

22
Q

What is a multi-leader configuration?

A

There are multiple leaders in the database topology, each leader can both be written to and acts as a follower to other leaders
The benefits rarely outweight the added complexity

23
Q

What are some disavantages of multi-leader replication?

A

The same data may be concurrently modified in two different datacenters
Those write conflicts must be resolved

24
Q

Describe a simple write conflict in a multi-leader database

A

A wiki page is being simultaneously edited by two users
User 1 changes the title from A to B
User 2 changes the title from A to C
Each users change is successfully applied to their local leader but when the change is asynchronously replicated a conflict is detected

25
How can we avoid conflicts?
Method 1: Changes to a certain page, for example, are always sent to the same leader Method 2: Last Write Wins Method 3: Give each replica a unique ID, writes from higher-numbered replica take precedence Method 4: Record the conflict in an explicit data structure and write application code that resolves the conflict on read
26
What is a replication topology?
The communication path along which writes are propogated from one node to another
27
What is the all-to-all multi-leader replication topology?
Every leader sends its writes to every other leader
28
What is the circular multi-leader replication topology?
Each node recieves writes from one node and forwards those writes, plus any writes of its own, to one ther node
29
What is the star multi-leader replication topology?
One node is designated as the root node which forwards it's writes to all other nodes
30
What problems may arise with the star and circular topology?
If just one node fails, it can interrupt the flow of replication messages between the other nodes
31
What problems may arise with the all-to-all topology?
Client A insert a row into a table on leader 1 Client B updates the row on leader 3 Leader 2 recieves the writes in a different order and is being asked to update a row that does not exist
32
What is leaderless replication?
Also known as Dynamo-style. Any node can process client write requests. A coordinator node may send write requests to other nodes on behalf of clients.
33
What are quorum writes and quorum reads?
Quorum write: Clients said their write requests to all/multiple replicas. If the number of nodes that respond successfully is greater than a certain threshold the write is considered successful. Quorum read: Clients said read requests to several nodes in parallel, version numbers are used to determine which value is newer.
34
What is read repair?
Clients make reads from several nodes in parallel (quorum reads) If the client sees that one of the responses is stale they can send a newer read back to that replica Good for data that is frequently read
35
What is an anti-entropy process?
A background process that looks for differences in data and copies missing data from one replica to another
36
What is the quorum condition?
If there are n replicas Every write must be confirmed by w nodes to be considered successful And we must query at least r nodes for each read As long as w + r > n we expect to get at least one up-to-date value when reading Think about it... set of nodes written and set of nodes read must overlap
37
How can stale values be returned even if the quorum condition is met?
- Two writes occur concurrently, especially if last write wins is used - Write happens concurrently with a read - Write succeeded in some replicas but failed in others, it is not rolled back, some replicas may or may not return the value - Data carrying new value fails and is restored using replica carrying old value, breaking the quorum condition
38
What is a sloppy quorum and hinted handoff?
- The client cannot connect to the usual n nodes which the data is stored - The data can be written to any w nodes, which may include nodes that are not where the data is usuaully stored - Once the client cannot connect again that data is sent back to the usual n nodes (hinted handoff) Useful for increaisng write availability
39
What are concurrent operations? (tricky)
Two operations that are unaware of each other. There is no happens-before relationship between them. e.g User 1 changes title A to B, User 2 changes title A to C
40
Describe a versioning algorithm to capture the happens-before relationship and deal with concurrent writes
Page 187-188 haha