Distributed Systems Flashcards
(38 cards)
What are the five aspects of a distributed system?
Distributed systems should be scalable, reliable, available, efficient, and manageable.
What’s does scalability mean with respect to a distributed system?
Scalability is the capability of a system to grow (or shrink) to meet evolving demands.
What are two forms of scalability? How do they differ?
Scalability can be horizontal or vertical. Horizontal generally means you add more nodes or servers to distribute and accommodate the load. Vertical generally means increasing the power or resources of an existing node or server.
Which is generally easier to implement or support? Horizontal or vertical scaling?
Horizontal is typically easier to add additional components to existing pool (especially in Kubernetes). Vertical can be limited to capacity of a single server (or pod) and can sometimes involve downtime.
What are good examples of database technologies that easily scale horizontally? What about vertically?
Cassandra and MongoDB both support easily scaling horizontally. MySQL is something that supports scaling vertically.
What does reliability mean within a distributed system?
Reliability means that a system should still operate and respond correctly even in the presence of errors, faults, or one or more failing components.
One failing component should always be able to be replaced by a healthy one.
What’s a real-world example of a company supporting reliability?
Amazon and their shopping cart transactions. A transaction should never be aborted due to the machine processing it failing, there should be a replica with that transaction to process it should a failure occur.
What’s an important and potentially costly solution to support reliability?
Reliability can be achieved through redundancy by eliminating single points of failure. This can be costly and sometimes as the expense of complexity to support it.
What is fault-tolerance?
Fault-tolerance is the concept of allowing a system to continue to function, potentially at a reduced rate, when some of its components fail (either through absorbing the error, retrying, etc.)
How do reliability and fault-tolerance differ?
Reliability refers to the entire system and end-to-end correctness. Fault-tolerance focuses on the ability of the system to continue operating when components fail.
Perspective-wise, where do reliability and fault-tolerance fall?
Reliability is a more user-centric concept (e.g. does the system meet my needs) whereas fault-tolerance is a more system centric concept (e.g. how are failures handled).
How do you measure reliability?
Metrics such as uptime, error rates, and time between failures.
How do you measure fault-tolerance?
Metrics such as recovery/failover time (I.e. how quickly a failure can be detected, isolated, and recovered).
Explain the relationship between availability and reliability.
If a system is reliable, it is available; however availability does not imply reliability.
What is serviceability? How does it play a role in distributed systems?
Serviceability is how easy a system is to maintain (e.g. ease of diagnosing issues, early detection of failures or issues, downtime during maintenance or upgrades).
What is a load balancer? What do they do?
A load balancer is a component that spreads traffic to help reduce overall load and responsiveness.
How does a load balancer work?
While requests are being distributed, the load balancer will keep track of the statuses (or potentially other metrics) to determine which resources should or shouldn’t have traffic directed to them.
What problem is a load balancer designed to solve?
It can mitigate against single points of failure (e.g. if a single web server fails, traffic will be redirected to healthy, responsive ones).
What are three levels of a typical application where a load balancer would be appropriate?
In between the user and the web servers (distributes traffic across multiple servers), between web servers and other application components (e.g. services like distributed caches, etc.), between internal services and databases.
What are a few benefits of load balancers?
Faster user experience, increased availability, metrics for predictive analytics, fewer or less stressed components or services.
What are the two factors that a load balancer uses to distribute traffic?
Health checks and the selected routing algorithm.
What are some of the common load balancing algorithms?
Least connection (least busy server), least response time (server responding the fastest), least bandwidth (server serving the least traffic), round robin (routes through servers in order, repeats), weighted round-robin (same but servers may be weighted based on resources or type), hash (specific traffic is routed based on some type of calculated hash, e.g. same ip goes to same server).
If a load balancer is unreliable, how can you prevent this?
You can implement a load-balancing group or cluster to function as a load-balanced, load-balancer (active/passive).
What is caching? What principal does it take advantage of?
Caching involves use of high speed storage layer that sits between the calling application and the original source of data. It takes advantage of the locality of reference principle.