System Design Flashcards

(27 cards)

1
Q

Delivery Framework

A
  1. Requirements (5 min)
  2. Core Entities (2 min)
  3. API (5 min)
  4. (Optional) Data Flow (5 min)
  5. High Level Design (10 - 15 min)
  6. Deep Dives (10 min)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Delivery Framework: Requirements

A
  1. Functional requirements (prioritize ~3)
  2. Non-functional
  3. (Optional) capacity estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Non-functional requirements

A
  • CAP Theorem (consistancy or performance)
  • Environmental constraints
  • Scalability (reads vs writes, hot spots)
  • Latency
  • Durability
  • Security
  • Fault Tolerance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Metrics units

A
  • Thousand = kilo
  • Million = mega
  • Billion = giga
  • Trillion = tera
  • Quadrillion = peta
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Common latencies

A
  • Reading 1 mb sequentially from memory = 0.25 ms
  • Reading 1 mb sequentially from SSD = 1 ms
  • Reading 1 mb sequentially from spinning disk = 20 ms
  • Round trip network latency CA to Netherlands = 150 ms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Common storage

A
  • 2-hour movie = 1 gb
  • Small book of plain text = 1 mb
  • High-resolution photo = 1 mb
  • Medium-resolution image or web graphic = 100 kb
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Common domain estimtations

A
  • DAUs on a social media network = 1b
  • Hours of video streamed on netflix/day = 100 m
  • Google searches/second = 100k
  • Size of Wikipedia = 100 gb
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 types of Scaling

A
  • horizontally - adding more machines
  • vertically - adding more resources to a single machine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Requirements for horizontal scaling

A
  • load balancer
  • load balancer strategy (round robin, queuing system, least connections, utilization-based)
  • try to partition data such that a single node has all the data it needs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Specialized Indexes

A
  • Use ElasticSearch
  • Types - geospatial, vector (find image or document), full-text (search document)
  • Set up ElasticSearch to index most databases using Change Data Capture (CDC)
  • Drawbacks - new failure point, new source of latency, stale data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Communication Protocols

Internally

A

HTTP(S) or gRPC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Communicatin Protocols

With Client

A
  • REST (Request -> Response)
  • Long polling
  • SSE (Server-Sent Events)
  • Websockets (Bi-directional Channel)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Long polling

A
  • Use when need to give clients near-realtime updates
  • Client makes a request and server holds the request open until it has data
  • Client can then make another request
  • Works with standard load balancers and firewalls
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Websockets

A
  • Use when need realtime, bidirectional communication
  • Challenge - must maintain many long open connections
  • Common pattern to use message broker to handle communication and backend services communicate directly with message broker (centralizes connection to client)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Server Sent Events (SSE)

A
  • Use when client needs multiple updates from server
  • Requires single long-lived HTTP connection
  • Requires less specialized infrastructure than websockets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Security

A
  • API Gateway for Authentication/Authorization
  • Encryption
  • Don’t pass userId or things like that through endpoints or bodies, should be in headers
17
Q

Search Optimized Database

A
  • Allows for full text search using indexing, tokenization, stemming
  • Inverted Index - index from word to document
  • Can confiure if fuzzy search is allowed
  • ElasticSearch
18
Q

API Gateway

A
  • Routes requests to correct microservice
  • Authentication
  • Rate limiting
  • Logging
19
Q

Load Balancer

A
  • Need a load balancer whenever you have multiple machines capable of handling the same request
  • Can leave out of box and pointer and just mention
  • AWS Elastic Load Balancer
20
Q

When to use a Queue

A
  1. Buffer for bursty traffic
  2. Distribute work across a system

If strong latency requirements (< 500 ms), queue will probably exceed

21
Q

Queues

A
  1. Message Ordering - typically FIFO but can be priority
  2. Retry configurations
  3. Dead letter queue for debgging/auditing
  4. Scaling with partitions (requires partition key)
  5. Backpressure to slow down producers

AWS SQS

22
Q

When to use a Stream

A
  1. Process large amounts of data in real-time (think analytics dashboard)
  2. Support complex processin scenarious like event sourcing (think transactions at a bank)
  3. Support multiple consumers reading from the same stream (think chat room)
23
Q

Streams

A
  1. Scaling with Partitioning
  2. Multiple consumers
  3. Replication
  4. Windowing

Kinesis

24
Q

Distributed Lock

A
  • Need to lock a resource for a period of time (maybe 10 min)
  • Use distributed key-value store like Redis to create a hash map of item -> lock.
  • Only one system or process can lock the particular item at a time
  • Can set an expiration on the lock so if process crashes, item doesn’t get stuck in locked state

Think: item in inventory while in cart, assignment of driver to rider

25
Cache Eviction Policy
* Least Recently Used * FIFO * Least Frequently Used
26
Cache Write Strategy
* Write-through cache - writes data to both cache and database simultaneously * Write-around cache - just writes to database (caches on next get) * Write-back cache - writes to cache and hen asynchronously to DB (may lose data) | Redis
27