System Design '23 Flashcards

1
Q

What distributed problems does a Load Balancer solve?

A

Server hotspot and Parallel Requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What problems can be solved by Data replication?

A

Geo location, server hotspot, parallel requests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What problems can Data Sharding solve?

A

Parallel requests, Data size limitations, and data hotspots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What problems can caching solve?

A

Data Hotspots, Parallel Requests, Geolocation problems,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s the format for writing an API on the whiteboard?

A

POST /order
request:
{
“user” : user_id
“quantity” : number
“item” : barcode
“date” datetime
}
response:
{
…data
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a 401 code?

A

Not Authorized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What code is use for Not Authorized?

A

401

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a 500 Response code mean?

A

Internal Service Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s a 403 mean?

A

Forbidden - valid formatted credentials but not authorized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the diff between 401 and 403 response codes?

A

401 means no or invalid credentials supplied. 403 means valid credential but user isn’t authorized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of database do you use with any kind of file storage?

A

Blob, or S3 storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do you need to mention when file transfers are mentioned?

A

Chunking, or torrent protocol. Break the file transfer into chunks and store them one at a time. Server can still be stateless and just map out the necessary file chunks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Linearizability?

A

This is how you achieve strong consistency, as in all transactions are read dealt with as if in sequential order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the cost of Linearization on a distributed system?

A

Availability, there will be latency in sending each transaction to all the replicas before allowing them to process a subsequent transaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a serialization problem on a non distributed system?

A

When parallel transactions occur, how do you make sure one happens in order? Serially.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does Blob stand for?

A

Binary Large object store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does serialization mean for?

A

Trxs happening in a sequential order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What’s the difference between web sockets and polling?

A

Polling repeatedly polls the network for a data change. Web sockets have a two way open communication and can each speak to each other when needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is session affinity?

A

When a LB stores a session and sends user requests to the same backend server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is gRPC used for?

A

It can be used in internal microservices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What 6 main api architectures?

A

Soap, gRPC, REST, GraphQl, Web sockets, webhooks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What’s a good usecase for web sockets?

A

Real time updates (Uber, delivery status, chat app)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the 3 main standards for Event Driven APIs?

A

Webhooks, WebSockets, HttpStreaming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What type of connection is gRPC/RPC good for?

A

Microservice communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are 3 use cases for consistent Hashing?

A

Distributed Cache, Load Balancing, Distributed databases

26
Q

How is consistent hashing achieved?

A

Hash Ring

27
Q

What are the 4 components of Torrent Protocol?

A
  1. Tracker - central server to track pieces
  2. Selection Algo - choose the rarest chunks first
  3. Swarm - group of peers hosting files
  4. Seeding - Once you download you seed.
28
Q

What’s the relationship of Torrent Protocol to chunking?

A

Torrent protocol utilizes chunking, but chunking can be used in any file transfer

29
Q

What benefits do Web Sockets provide over Http requests?

A

Http request has 2-3 KB of data for each request. Websocket has an initial overhead, but then each subsequent request is only a few bytes overhead.

30
Q

What are some use cases for Web Hooks?

A

Event Driven Actions, Asynchronous processing, Data synchronization, and decoupled architecture

31
Q

What is 100Mb * 10 KB?

A

1,000,000,000,000 = 1 TB

32
Q

What is 500,000 * 10KB?

A

5,000,000,000 5GB

33
Q

What is 1,000,000/day in x per second?

A

1000000/100000 = 10/second

34
Q

How many seconds in a day? What do we round to?

A

86,400. Round to 100,000

35
Q

When is an API Gateway needed?

A

When many microservices exist in an architecture, the gateway can be a uniform entrypoint for users?

36
Q

What is 1 MB * 1 MB?

A

1 TB

37
Q

What is 1 KB * 1 KB

A

1 MB

38
Q

What’s a clustered index?

A

DB index as part of the database itself

39
Q

What’s a non clustered index?

A

DB Index outside of the main table.

40
Q

What’s the diff between web hook and a normal http connection?

A
  1. web hooks are Server initiated, often server-server communication
  2. web hooks are asynchronous
  3. Web hooks are more of a notification or one-way communication rather than waiting for a response.
41
Q

What are the 4 ways of writing to cache and a backend DB?

A

Write Through, Write Back (Later), Write Around, Read Through

42
Q

Describe Write Through

A

Data is written to the db at the same time it is written to the cache.

43
Q

Describe Read Through

A

Data is written to the backend, but only put in the cache when it is retrieved later.

44
Q

What does WebRTC solve? What problem does this cause?

A

Low latency live broadcasts. Harder to do adaptive bitrates.

45
Q

What protocol does Zoom like streaming use?

A

WebRTC?

46
Q

What might twitch use to handle both live streaming and user interactions?

A

HLS for the streaming and websockets for the interactions.

47
Q

Describe Write Around

A

Data written directly to backend skipping cache all together.

48
Q

Describe Write Later

A

Data is kept in cache and only written to the DB later when there is more processing cpu available.

49
Q

What’s the difference between session persistence and session affinity?

A

Both mean the LB routes the user to the same backend server. Persistence means session state might even be maintained on the backend server.

50
Q

What are sticky sessions?

A

An implementation of Session affinity where the user is routed to the same backend server by ip or another address

51
Q

What is centralized session storage?

A
52
Q

What’s the term for round robin that takes capacity into account?

A

Weighted Round Robin

53
Q

What’s a good way to ensure a user’s shopping cart is the same across all servers?

A

Centralized session storage. Any backend server can access the storage.

54
Q

What is Dash? What are some companies that use it?

A

Http Dash is a video stremaing protocol, used by popular companies like netflix, hulu, amazon prime, disney.

55
Q

What does dash stand for?

A

“Dynamic Adaptive Streaming over HTTP.”

56
Q

1 Billion requests per day is how many per second?

A

10,000

57
Q

500 million requests per day is how many per second?

A

500 million/100,000 = 5K/second

58
Q

How does Redis handle leader/follower?

A

Redis Sentinel and/or Cluster.

59
Q

What does Redis use for sharding and partitioning data?

A

Redis Cluster.

60
Q

If a cache needsto be distributed with high availability what is a good tool?

A

Redis Cluster

61
Q

What is Redis Sentinel used for?

A

Replicating data, monitoring nodes, promoting a follower to a leader when a failure goes down.

62
Q

What is the main limitation of Hadoop? What was a solution to that?

A

It can be accessed only sequential. The solution was HBase which can be accessed randomly.