Distributed Architectures Flashcards
What are the two types of scaling?
Vertical and Horizontal
What is Vertical Scaling?
Increasing the capacity or ability of a single system or machine
What is Horizontal Scaling?
Adding more instances or server to distribute the workload across multiple machines
What is replication?
A copy of the data is kept in each node
What is partitioning?
Different nodes have different part of the data, each node stores a subset
What are the advantages of replication?
- Reducing latency as as data is geographically close
- Increased availability
- Increased Performance due to high No of machines to serve query
4.
What are the disadvantages of replication?
- The data has to be small enough to be stored on a single machine.
- Write operations involve updating multiple copies of the same data
- Updates need to be made on all nodes, if a write fails then data reliability can be affected
How are nodes organised?
Nodes are followers with one node designated a leader. The leader has the most recent data and all write operations are sent to the leader.
How are read and write operations handled with leader/follow dynamic?
All write operations are sent to the leader and once completed, send updates to the followers.
Read Queries can be sent to any node so are much quicker.
How is Synchronous Replication used?
When the leader sends out a write request to followers only a subset of nodes need have confirmed completion. If all were required then a small issue could stop the whole system.
What is a partitioned database?
A database that stores different subsets of the data in different nodes.
How does partitioning store data?
A node can have more than one partition but each piece of data will belong to exactly one partitions.
What is the goal of partitioning?
The goal is to spread the data and the query load evenly across nodes.
What are Hotspot nodes?
Nodes that handle a disproportionately larger amount of data or load. Also known as skewed partitioning.
What are three ways of partitioning data?
- Randomly scatter the data across nodes
- Partitioning by key range
- Partitioning by Hash of key
Why are partitions rebalanced?
To fairly share the load between the nodes in the cluster and reduce hot spot nodes.
What re the two types of partitioning?
- Fixed number
- Dynamic
How is fixed number partitioning implemented?
Many more partitions than nodes are created. If nodes are added, they can steal from the other nodes. Only entire partitions are moved between nodes, and the number of partitions doesn’t change, nor does the assignment of keys to partitions.
How is dynamic partitioning implemented?
If partitions grow beyond a certain size they are split in two. If the data becomes smaller, partitions can be merged.
What does CAP stand for?
Consistency, Availability, Partition Tolerance
What is the CAP theorem?
In a distributed system only two of consistency, availability, and partition tolerance can exist together.
What is system consistency?
Every read receives the most recent write or an error
What is system availability?
Every request receives a non-error response.
What is partition tolerance?
The system continues to operate even if the parts of the network become disconnected.