Distributed Systems Flashcards

Question

What is the goal of caching?

Answer 1

Caching prevents unnecessary calls to more expensive storage layers for common lookups.

Answer 2

In-memory caches, disk caching, database caching, and CDNs.

Answer 3

In-memory caching involves storing data within memory on the application server and referencing it directly. Redis or Memcached are common examples of this.

Answer 4

Disk-caching is the idea of storing data on disk on the application server instead of in-memory. While not as fast as reading from memory, it’s typically faster than lookups from external sources. Database queries and file system data are good examples of something cached on disk.

Answer 5

Database caching is when commonly accessed data can be cached in the database itself. Result sets and queries (query execution plans) are common examples of what is stored in a database cache.

Answer 6

Client-side caching, as the name implies, involves caching data on the client’s machine (e.g. web app, browser) to prevent calls to the server. This is common used to store static or data that seldom changes such as images, JavaScript, CSS or other client resources.

Answer 7

Much like client-side is stored on the client, data can be cached on the server as well. Web servers can commonly cache page responses (full or partial), objects, etc.

Answer 8

CDN caching involves storing data on a distribution of servers (typically geographically distributed) to improve data access from remote locations. Images, video, and other static assets are commonly stored. Typically CDNs are reserved or commonly used for global or very large-scale applications.

Answer 9

DNS caching involves the Domain Name Server caching IP addresses to avoid frequent lookups via domain. This can be extremely useful to greatly improve performance.

Answer 10

If caches aren’t properly evicted or invalidated, users can experience stale data or potentially system consistency issues, especially in systems with a large number of disparate caching mechanisms.

Answer 11

Write-through, write-around, and write-back. Write-through which will write to the cache and database simultaneously to avoid stale cache data. It can have higher latency as there are two writes as opposed to one. Write-around will only write to the primary data store (not the cache). This avoids the cache being flooded with writes, but recently accessed data will need to be read directly from the data store. Write-back involves only writing to the cache and later reflecting those writes to the data store. This can be highly performant but risky as crashes could cause data in the cache to be lost.

Answer 12

Purge, refresh, ban, TTL, and stale-while-revalidate. Purge removes content from the cache. Refresh reads from the source, even if content is in the cache and update it. Ban will invalidate the cache based on some type of pattern or criteria. TTL will check the cache and if the data is expired (via the TTL) it will refresh it from the data store. Stale-will-revalidate will serve stale content during an update and refresh the cache asynchronously once finished, typically reserved for CDNs.

Answer 13

Read-through means the cache is responsible for reading from the source during misses. Read-aside means the application is responsible for reading, checking the cache, or from the source during misses. Read-through can be simpler to implement but doesn’t provide the same fine-grain control or fault-tolerance as read-aside.

Answer 14

First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before. Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before. Least Recently Used (LRU): Discards the least recently used items first. Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first. Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first. Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.

Distributed Systems Flashcards

(38 cards)