Lesson 10: Consistency in Distributed Data Stores Flashcards

Question 1

Q

Why is consistency important & hard?

Answer

A

distributed state
replication for availability and fault
caching for performance
failures

We need guarantees about the order of the writes and reads

But this is hard because we maintain multiple copies of the state and is complicated by various forms of failures.

Question 2

Q

What is the purpose of a consistency model & some examples?

Answer

A

The a consistency model makes a guarantee about the ordering of updates in the system and how these will be visible to ongoing read operations.

Strong consistency: linearizability
sequential consistency: guarantee single ordering
causal consistency: enforce ordering per happens before, no guarantee for concurrent events
eventual consistency: as long as partitions/failures are not permanent

Question 3

Q

What is a look-aside cache design?

Answer

A

Single Memecache Cluster Design:

Server first tries to get data from memcache. If result is not present in the cache, the server then gets it from the database, then saves it in the cache.

When performing a delete, the server must first delete it from the database, then the cache. In worst case, if the process fails and the data is not deleted from the cache it will eventually expire.

Question 4

Q

What are some problems with a look-aside cache design?

Answer

A

If two servers make requests to set a certain value in the database and the cache, those requests can be reordered in flight, resulting in differences between the cache and the database.

Solution: Leases
- issued on cache miss
- detect concurrent writes: supports ordering or writes

thundering herd
serving stale values

Question 5

Q

How does memcache maintain consistency when scaled to multiple clusters?

Answer

A

The database drives cache invalidations in commit order.

This means it’s possible for the data to be stale but it will be in the right order.

Question 6

Q

What mechanism does memcache use in multi-region data sharded setups?

Answer

A

server sets a remote marker in memcache
server writes to remote master database (in a different region)
server deletes value from memcache (the local database doesn’t yet have the value so we can’t push it in the cache yet)
mysql replication will push the update from the remote master database to the local database
once the local database has been updated, the remote marker in memcache is deleted.

NOTE: until the remote marker is deleted, the value can only be served from the remote master database.

Question 7

Q

Why is a causal consistency model at the system level not enough (e.g., we’ll see out of order updates, etc.)?

Answer

A

The system (e.g., a server) observes data accesses to determine causality. However, when the system involves (geo)replication, caching, etc. then all reads/writes are not visible in a single location and therefore appear as concurrent operations so it’s hard for the system to tell how to order operations.

Enter Causal+ (aka COPS)

The client making reads/writes captures this information. Later when updating the cache (issues a put after operation) the client provides this metadata. This allows the remote data store to perform dependency checks (e.g., won’t allow an update to become visible until dependencies have been satisfied)

Lesson 10: Consistency in Distributed Data Stores Flashcards

(7 cards)