Distributed Systems Flashcards

(38 cards)

1
Q

What are the five aspects of a distributed system?

A

Distributed systems should be scalable, reliable, available, efficient, and manageable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s does scalability mean with respect to a distributed system?

A

Scalability is the capability of a system to grow (or shrink) to meet evolving demands.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are two forms of scalability? How do they differ?

A

Scalability can be horizontal or vertical. Horizontal generally means you add more nodes or servers to distribute and accommodate the load. Vertical generally means increasing the power or resources of an existing node or server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which is generally easier to implement or support? Horizontal or vertical scaling?

A

Horizontal is typically easier to add additional components to existing pool (especially in Kubernetes). Vertical can be limited to capacity of a single server (or pod) and can sometimes involve downtime.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are good examples of database technologies that easily scale horizontally? What about vertically?

A

Cassandra and MongoDB both support easily scaling horizontally. MySQL is something that supports scaling vertically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does reliability mean within a distributed system?

A

Reliability means that a system should still operate and respond correctly even in the presence of errors, faults, or one or more failing components.

One failing component should always be able to be replaced by a healthy one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s a real-world example of a company supporting reliability?

A

Amazon and their shopping cart transactions. A transaction should never be aborted due to the machine processing it failing, there should be a replica with that transaction to process it should a failure occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s an important and potentially costly solution to support reliability?

A

Reliability can be achieved through redundancy by eliminating single points of failure. This can be costly and sometimes as the expense of complexity to support it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is fault-tolerance?

A

Fault-tolerance is the concept of allowing a system to continue to function, potentially at a reduced rate, when some of its components fail (either through absorbing the error, retrying, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do reliability and fault-tolerance differ?

A

Reliability refers to the entire system and end-to-end correctness. Fault-tolerance focuses on the ability of the system to continue operating when components fail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Perspective-wise, where do reliability and fault-tolerance fall?

A

Reliability is a more user-centric concept (e.g. does the system meet my needs) whereas fault-tolerance is a more system centric concept (e.g. how are failures handled).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you measure reliability?

A

Metrics such as uptime, error rates, and time between failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you measure fault-tolerance?

A

Metrics such as recovery/failover time (I.e. how quickly a failure can be detected, isolated, and recovered).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the relationship between availability and reliability.

A

If a system is reliable, it is available; however availability does not imply reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is serviceability? How does it play a role in distributed systems?

A

Serviceability is how easy a system is to maintain (e.g. ease of diagnosing issues, early detection of failures or issues, downtime during maintenance or upgrades).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a load balancer? What do they do?

A

A load balancer is a component that spreads traffic to help reduce overall load and responsiveness.

17
Q

How does a load balancer work?

A

While requests are being distributed, the load balancer will keep track of the statuses (or potentially other metrics) to determine which resources should or shouldn’t have traffic directed to them.

18
Q

What problem is a load balancer designed to solve?

A

It can mitigate against single points of failure (e.g. if a single web server fails, traffic will be redirected to healthy, responsive ones).

19
Q

What are three levels of a typical application where a load balancer would be appropriate?

A

In between the user and the web servers (distributes traffic across multiple servers), between web servers and other application components (e.g. services like distributed caches, etc.), between internal services and databases.

20
Q

What are a few benefits of load balancers?

A

Faster user experience, increased availability, metrics for predictive analytics, fewer or less stressed components or services.

21
Q

What are the two factors that a load balancer uses to distribute traffic?

A

Health checks and the selected routing algorithm.

22
Q

What are some of the common load balancing algorithms?

A

Least connection (least busy server), least response time (server responding the fastest), least bandwidth (server serving the least traffic), round robin (routes through servers in order, repeats), weighted round-robin (same but servers may be weighted based on resources or type), hash (specific traffic is routed based on some type of calculated hash, e.g. same ip goes to same server).

23
Q

If a load balancer is unreliable, how can you prevent this?

A

You can implement a load-balancing group or cluster to function as a load-balanced, load-balancer (active/passive).

24
Q

What is caching? What principal does it take advantage of?

A

Caching involves use of high speed storage layer that sits between the calling application and the original source of data. It takes advantage of the locality of reference principle.

25
What is the goal of caching?
Caching prevents unnecessary calls to more expensive storage layers for common lookups.
26
What are four common use-cases or types of cache?
In-memory caches, disk caching, database caching, and CDNs.
27
What is in-memory caching? How does it work and what’s a good example of it?
In-memory caching involves storing data within memory on the application server and referencing it directly. Redis or Memcached are common examples of this.
28
What is disk-caching? How does it differ from in-memory caching? What’s a common use-case for it?
Disk-caching is the idea of storing data on disk on the application server instead of in-memory. While not as fast as reading from memory, it’s typically faster than lookups from external sources. Database queries and file system data are good examples of something cached on disk.
29
What is database caching? What are some common forms of it?
Database caching is when commonly accessed data can be cached in the database itself. Result sets and queries (query execution plans) are common examples of what is stored in a database cache.
30
What is client-side caching? What might be stored there?
Client-side caching, as the name implies, involves caching data on the client’s machine (e.g. web app, browser) to prevent calls to the server. This is common used to store static or data that seldom changes such as images, JavaScript, CSS or other client resources.
31
What’s server-side caching? What’s commonly stored there?
Much like client-side is stored on the client, data can be cached on the server as well. Web servers can commonly cache page responses (full or partial), objects, etc.
32
What is CDN caching? What scenarios is it good for?
CDN caching involves storing data on a distribution of servers (typically geographically distributed) to improve data access from remote locations. Images, video, and other static assets are commonly stored. Typically CDNs are reserved or commonly used for global or very large-scale applications.
33
What is DNS caching? What is it good for?
DNS caching involves the Domain Name Server caching IP addresses to avoid frequent lookups via domain. This can be extremely useful to greatly improve performance.
34
Why is cache invalidation important?
If caches aren’t properly evicted or invalidated, users can experience stale data or potentially system consistency issues, especially in systems with a large number of disparate caching mechanisms.
35
What are the three primary schemes for cache-invalidation during writes? How do each work?
Write-through, write-around, and write-back. Write-through which will write to the cache and database simultaneously to avoid stale cache data. It can have higher latency as there are two writes as opposed to one. Write-around will only write to the primary data store (not the cache). This avoids the cache being flooded with writes, but recently accessed data will need to be read directly from the data store. Write-back involves only writing to the cache and later reflecting those writes to the data store. This can be highly performant but risky as crashes could cause data in the cache to be lost.
36
What are the five primary forms of cache invalidation operations?
Purge, refresh, ban, TTL, and stale-while-revalidate. Purge removes content from the cache. Refresh reads from the source, even if content is in the cache and update it. Ban will invalidate the cache based on some type of pattern or criteria. TTL will check the cache and if the data is expired (via the TTL) it will refresh it from the data store. Stale-will-revalidate will serve stale content during an update and refresh the cache asynchronously once finished, typically reserved for CDNs.
37
What are the two most popular cache read strategies? How do they differ?
Read-through means the cache is responsible for reading from the source during misses. Read-aside means the application is responsible for reading, checking the cache, or from the source during misses. Read-through can be simpler to implement but doesn’t provide the same fine-grain control or fault-tolerance as read-aside.
38
What are the six most common cache eviction policies?
First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before. Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before. Least Recently Used (LRU): Discards the least recently used items first. Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first. Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first. Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.