Scaling Flashcards

(35 cards)

1
Q

What does DNS stand for?

A

Domain Name Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the DNS Server do?

A

Converts website domain names to IP Addresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does HTTP stand for?

A

Hypertext Transfer Protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does IP stand for?

A

Internet Protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does JSON stand for?

A

JavaScript Object Notation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Under what circumstances might you use a non-relational database over a relational database?

A
  • you need super-low latency
  • unstructured data
  • You only need to serialize / deserialize data
  • You need to store a large amount of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 common types of non-relational databases?

A
  1. key-value stores
  2. graph stores
  3. column stores
  4. document stores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a join operation?

For databases

A

Join combines data from multiple tables into a new dataset based on a specified condition (like a common field)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Do non-relational databases support join operations?

A

Generally, no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between vertical scaling and horizontal scaling?

A
  • Vertical Scaling involves making your existing resources more powerful
  • Horizontal Scaling involves adding more resources to your pool of resources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 2 reasons you would prefer horizontal scaling over vertical scaling

A
  1. Vertical scaling has a hard limit
  2. Vertical scaling does not have failover/redundancy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a load balancer do?

A

distributes incoming traffic among servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In the context of Database Replication

Explain the difference of master DBs and slave DBs

A
  • master DBs handle all write operations
  • slave DBs handle all read operations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 3 key benefits of database replication?

A
  1. Performance (parallelization)
  2. Reliability
  3. High Availability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the data flow for a read-through cache

A

read(x):
- If x not in cache:
- cache[x] := db.read(x)
- return cache[x]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the data flow for a write-through cache

A

write(x, v):
- If x in cache:
- cache[x] := v
- db.write(x, v)

17
Q

Can a cache be both read-through AND write-through?

18
Q

Do you need a different cache in each datacenter?

A

Yes!

otherwise the cache can become a SPOF

19
Q

In cacheing

What is an expiration policy?

A

How long you wait before removing data from the cache

20
Q

What problem can happen if a cache expiration policy is too small on a cache?

A

You lose the speedup benefits of cacheing

21
Q

What problem can happen if a cache expiration policy is too large on a cache?

A

Data can become stale

22
Q

What does SPOF stand for?

A

Single Point Of Failure

23
Q

What is the best situation for cacheing?

A

When data is read frequently but modified infrequently

24
Q

What are the 5 key considerations around cacheing?

A
  1. effectiveness (high read, low write)
  2. Expiration Policy
  3. Consistency
  4. Failure Mitigation (avoid becoming SPOF)
  5. Invalidation Policy
25
What does **CDN** stand for?
**C**ontent **D**elivery **N**etwork
26
What is a CDN?
A network of _geographically dispersed_ servers used to deliver _static content_
27
What is the typical cost structure for a CDN?
charged *per data transfer in/out*
28
When a user visits a website, which CDN server should deliver content to the user?
The CDN server _geographically closest to the user_
29
Why is stateless architecture generally preferable to stateful architecture?
Because requests from the same client can be routed to different servers | without the overhead of sticky sessions ## Footnote So, better decoupling
30
How might we be able to store "state" data in a stateless architecture?
Keep state data in a separate data store from the rest of the web layer architecture
31
How might we improve website availability / performance across wider geographical areas?
Use multiple data centers (thing AWS regions / AZs), and geo-load balancing ## Footnote Using a CDN helps with geo-performance but *not availability*
32
Describe a **message queue** architecture
(think SQS): X Producers putting work on the queue, Y consumers taking work off the queue; decouples producer and consumer work
33
How can you perform horizontal scaling at the database level _without_ replicating your database?
**DB sharding** (e.g. with a shard hash of `primary_key % num_shards`)
34
What are 3 downsides of sharding?
1. **Resharding** after growth 2. **Celebrity Problem** or an uneven traffic distribution 3. **Complicated Joins** (can be mitiigated by denormalizing data)
35
What are the *8* big ideas for scaling?
1. Statelessness 2. Redundancy 3. Cacheing 4. Multiple Data Centers 5. CDNs for static assets 6. Sharding in Data Tier 7. Decoupling 8. Logging / Monitoring / Automation