Distributed Architectures Flashcards

1
Q

What are the two types of scaling?

A

Vertical and Horizontal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Vertical Scaling?

A

Increasing the capacity or ability of a single system or machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Horizontal Scaling?

A

Adding more instances or server to distribute the workload across multiple machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is replication?

A

A copy of the data is kept in each node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is partitioning?

A

Different nodes have different part of the data, each node stores a subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the advantages of replication?

A
  1. Reducing latency as as data is geographically close
  2. Increased availability
  3. Increased Performance due to high No of machines to serve query

4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the disadvantages of replication?

A
  1. The data has to be small enough to be stored on a single machine.
  2. Write operations involve updating multiple copies of the same data
  3. Updates need to be made on all nodes, if a write fails then data reliability can be affected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are nodes organised?

A

Nodes are followers with one node designated a leader. The leader has the most recent data and all write operations are sent to the leader.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are read and write operations handled with leader/follow dynamic?

A

All write operations are sent to the leader and once completed, send updates to the followers.

Read Queries can be sent to any node so are much quicker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is Synchronous Replication used?

A

When the leader sends out a write request to followers only a subset of nodes need have confirmed completion. If all were required then a small issue could stop the whole system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a partitioned database?

A

A database that stores different subsets of the data in different nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does partitioning store data?

A

A node can have more than one partition but each piece of data will belong to exactly one partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the goal of partitioning?

A

The goal is to spread the data and the query load evenly across nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are Hotspot nodes?

A

Nodes that handle a disproportionately larger amount of data or load. Also known as skewed partitioning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are three ways of partitioning data?

A
  1. Randomly scatter the data across nodes
  2. Partitioning by key range
  3. Partitioning by Hash of key
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why are partitions rebalanced?

A

To fairly share the load between the nodes in the cluster and reduce hot spot nodes.

17
Q

What re the two types of partitioning?

A
  1. Fixed number
  2. Dynamic
18
Q

How is fixed number partitioning implemented?

A

Many more partitions than nodes are created. If nodes are added, they can steal from the other nodes. Only entire partitions are moved between nodes, and the number of partitions doesn’t change, nor does the assignment of keys to partitions.

19
Q

How is dynamic partitioning implemented?

A

If partitions grow beyond a certain size they are split in two. If the data becomes smaller, partitions can be merged.

20
Q

What does CAP stand for?

A

Consistency, Availability, Partition Tolerance

21
Q

What is the CAP theorem?

A

In a distributed system only two of consistency, availability, and partition tolerance can exist together.

22
Q

What is system consistency?

A

Every read receives the most recent write or an error

23
Q

What is system availability?

A

Every request receives a non-error response.

24
Q

What is partition tolerance?

A

The system continues to operate even if the parts of the network become disconnected.

25
What are the ACID properties?
Atomicity, Consistency, Isolation, Durability
26
What is eventual consistency?
At a given time point some data nodes will have an outdated version of the data, overtime this will be updated resulting in eventual consistency and all the nodes have the same data.
27
What is linearisability?
The idea that from a user perspective the system should appear as if all the operations on it are atomic and there is one single copy of data.
28
How is linearizability achieved?
If one client reads a value then all subsequent clients should read the same value
29
What are the characteristics of linearizability?
If applications require linearizability, some replicas are disconnected; applications that don't require linearizability can be more tolerant to network faults. It has performance impacts and isn't guaranteed in many systems in favour of performance.
30
How do network delays affect response times?
If linearizability is desired, the response time of read and write requests should be at least proportional to the uncertainty of network delays.
31
What is consensus?
Getting several nodes to agree on something.
32
What is Atomicity?
The idea that a transaction that involves multiple nodes or services is treated as one indivisible piece of work. Either all commits are successful or none of the commits are.
33
What is Consistency?
All nodes or replicas have the same view of data at a given time.
34
What is a Two-phase commit?
An algorithm for achieving atomic transaction commits across multiple nodes
35