Systems Design Flashcards

1
Q

What are the steps in request flow and traffic source?

A
  1. User accesses website through the domain name via a DNS (Domain Name Service).
  2. An IP (Internet Protocol) address is returned to the browser or mobile app.
  3. Once the IP address is obtained, HTTP (Hypertext Transfer Protocol) requests are sent directly to your web server
  4. The web server returns HTML pages or JSON response for rendering.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When should you consider a non-relational Database?

A
  • Your application requires super-low latency
  • Your data is unstructured, or you do not have any relational data
  • You only need to serialize and deserialize data (JSON, XML, YAML, etc.)
  • You need to store a massive amount of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vertical vs. Horizontal Scaling

A

Vertical Scaling (“scale-up”): the process of adding more power (CPU, RAM, etc.) to your servers.

Horizontal Scaling (“scale out”): allows you to scale by adding more servers into your pool of resources.

Vertical Pros:
- simplicity
- used when traffic is low

Vertical Cons:
- has a hard limit. It is impossible to add unlimited CPU and memory to a single server.
- Does not have failover and redundancy. If one server goes down, the website/app goes down with it completely.

Horizontal Pros:
- Much better for large scale apps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Load Balancer

A

Evenly distributes incoming traffic among web servers that are defined in a load-balancing set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Private IP

A

An IP Address that is only reachable between servers in the same network. It is unreachable over the internet.

Load balancers communicate with web servers through private IPs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cache

A

A temporary storage area that stores the result of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly.

How it works:
- Server receives request from browser, and checks if the cache has the available response.
- If it does, it sends data back to the client. If not, it queries the DB, stores the response in the cache, and sends it back to the client.

This strategy is called a “read-through cache”

Considerations:

When to use a cache:
- When data is read frequently, but modified infrequently
- Cache server is not ideal for persisting data

Expiration Policy:
- Good practice to have one
- once cached data is expired, it is removed from the cache
- Don’t make it too short as this will cause system to reload data from DB frequently
- Not too long because data becomes stale (out of date)

Consistency:
- Keep the data store and the cache in sync
- Inconsistency happens due to data-modifying operations on the data store and cache are not in a single transaction
- When scaling across multiple regions, maintaining consistency b/w the data store and the cache is challenging

Mitigating Failures:
- A single cache server represents a potential single point of failure.
- Multiple cache servers across different data centers are recommended
- Also consider overprovision the required memory by certain percentages to provide a buffer as the memory usage increases

Eviction policy:
- once the cache is full, any requests to add items to the cache might cause existing items to be removed.
- Least-recently-used (LRU) is the most popular cache eviction policy.
- Others include Least Frequently Used (LFU) or First in First Out (FIFO)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Content Delivery Network (CDN)

A

A network of geographically dispersed servers used to deliver static content.

CDN servers cache static content like images, videos, CSS, Javascript files, etc.

It basically is the same as a cache, but for static content.

How it Works:
- User visits a website
- a CDN server closest to the user will deliver static content
- The further a user is (geographically) from the CDN servers, the slower the website will load
- if the CDN server does not have the content (e.g. image), the CDN server requests it from the origin which can be a web server or online storage like Amazon S3
- The origin returns the image to the CDN server and includes an HTTP header called “Time-to-live” (TTL) which tells the CDN when to expire that content
- Finally, if another user requests that same content, the CDN now has it cached and can return it to User B

Considerations:
- Cost
- Appropriate Cache Expiry
- CDN Fallback
- Invalidating Files by:
- Invalidate the CDN object using APIs provided by CDN
vendors
- Object Versioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stateless Web Tier

A

Downsides of Stateful Arch
- Having to use the same server for each user to keep track of state (authentication for example)
- Adds overhead to make “sticky” sessions
- Handling server failures is difficult

Upsides of Stateless
- HTTP requests from users can be sent to ANY web servers which fetch state data from a shared data store (redis, NoSQL etc.)
- Simpler, more robust, and scalable
- Allows autoscaling of web servers based on traffic load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Centers

A

geoDNS routing
- geoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user
- users are geo-routed to the closest data center in US-East and US-West (typically, for example).

Technical challenges to achieve multi-data center setup:
- Traffic redirection – GeoDNS can be used to direct traffic to nearest data center depending on where a user is located
- Data synchronization – Users from different regions could use different local DBs or Caches. In failover cases, traffic might be routed to a data center where data is unavailable. A common strategy is to replicate data across multiple data centers.
- Test and deployment – it is important to test your website/app at different locations. Automated deployment tools are vital to keep services consistent through all the data centers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Messaging queue

A

A durable component, stored in memory, that supports asynchronous communication. It serves as a buffer and distributes asynch requests.

Basic Arch of a Message Queue
- Input services, called producers/publishers, create messages, and publish them to a message queue
- Other services or servers called “consumers/subscribers”, connect to the queue and perform actions defined by the messages

This decoupling makes the message queue a preferred architecture for building a scalable and reliable application.

Use case:
- Your app supports photo customization (cropping, sharpening, etc.)
- Those customization tasks take time to complete
- web servers publish photo processing jobs to the message queue
- Photo processing workers pick up jobs from the queue and asynchronously perform photo customization takss
- the producer and the consumer can be scaled independently
- when the size of the queue becomes large, more workers are added to reduce the processing time.
- however, if the queue is empty most of the time, the number of workers can be reduced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Logging, metrics, automation

A

Logging:

Monitoring error logs helps to identify errors and problems in the system. You can monitor at per server level or use tools to aggregate them to a centralized service for easy search and viewing

Metrics:

Help us gain business insights and understand the health status of the system. Here are some helpful metrics:
- Host level metrics: CPU, Memory, disk I/O, etc.
- Aggregated level metrics: e.g. performance of entire DB tier, cache tier, etc.
- Key business metrics: daily active users, retention, revenue, etc.

Automation:

  • Build, test, deploy process:
    Continuous integration is a good practice in which code check-in is verified through automation, allowing teams to detect problems early.
  • Add message queues and different tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Database Scaling

A

Vertical:
- adding more power (CPU, RAM, DISK etc)
- Limitations of hardware
- SPOF risks
- overall cost is high. powerful servers are more expensive

Horizontal (“sharding”)
- Separates large DBs into smaller, more easily managed parts called “shards”.
- Each shard shares the same schema, but the actual data on each shard is unique to the shard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Scale from Zero to Millions of Users (High level)

A
  • Keep web tier stateless (add noSQL/cache to track state)
  • Build redundancy at every tier (load balancer, more servers, duplicate DBs, multi-DCs)
  • Cache data as much as you can
  • Support multiple data centers
  • Host static assets in CDN
  • Scale your data tier by sharding
  • Split tiers into individual services (message queue?)
  • Monitor your system and use automation tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Back-of-envelope: Power of Two

A

10 == 1 thousand == 1 Kilobyte (KB)
20 == 1 million == 1 Megabyte (MB)
30 == 1 Billion == 1 Gigabyte (GB)
40 == 1 Trillion == 1 Terabyte (TB)
50 == 1 Quadrillion == 1 Petabyte (PB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Back-of-envelope: Latency numbers every programmer should know

A

Operation name Time

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10K ns (10µs)
Send 2K bytes over 1 Gbps network 20K ns (20µs)
Read 1 MB sequentially from memory 250K ns (250µs)
Round trip within same datacenter 500K ns (500µs)
Disk seek 10,000,000 ns (10 ms)
Read 1 MB sequentially from the network 10 Mil ns (10 ms)
Read 1 MB sequentially from disk 30 mil ns (30 ms)
Send packet CA -> Netherlands -> CA 150 mil ns (150 ms)

Conclusions:
* memory is fast; disk is slow
* avoid disk seeks if possible
* simple compression algorithms are fast
* compress data before sending it over the internet if possible
* Data centers are usually in different regions, and it takes time to send data between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Back of envelope: Availability Numbers

A

High availability is the ability of a system to be continuously operational for a desirably long period of time. High availability is measured as a percentage, with 100% means a service that has 0 downtime.

Most services fall between 99% and and 100%

A service level agreement (SLA) is a commonly used term for service providers. This is an agreement between you ( the service provider) and your customer and formally defines the level of uptime your service will deliver.

17
Q

Questions to ask at the beginning of the Systems Design Interview:

A
  • What specific features are we going to build? Which are most important?
  • How many users does the product have?
  • How fast does the company anticipate to scale up? What are the anticipated scales in 3 months, 6 months, a year?
  • What is the company’s tech stack? What existing services you might leverage to simplify the design?
  • Is this a mobile app, web app, or both?
  • What is the traffic volume?
18
Q

Framework for Systems Design Interviews (4 steps)

A

Step 1 - Understand the problem and establish design scope
* Ask clarifying questions
* See previous card (Questions to ask…)

Step 2 - Propose high-level design and get buy-in
* collaborate with the interviewer
* draw up initial blueprint, ask for feedback, treat them as teammate
* Draw box diagrams with key components on the whiteboard or paper. (clients, APIs, web servers, data stores, cache, CDN, message queue, etc.)
* Do back-of-the-envelope calcs to evaluate if your blueprint fits the scale constraints. Think outloud. Ask if BOTE is necessary before diving into it

If possible go through a few concrete use cases. This may also help you find som edge cases

Step 3 - Design Deep Dive
At this point you should have already:
1) Agreed on the overall goals and feature scope
2) Sketched out a high-level blueprint for overall design
3) Obtained feedback from interviewer on high-level design
4) Had some initial ideas about areas to focus on in deep dive based on her feedback

  • Work with interviewer to identify and prioritize components in the arch.
  • Every interview is different, so try to gauge where they want you to focus (more high-level, bottlenecks, system performance, etc.)
  • Be wary of time management
  • don’t get into unnecessary details

Step 4 - Wrap up
* Never say nothing can be improved
* Look for bottlenecks or discuss potential improvements
* Give a recap of your design
* Error cases (server failure, network loss, etc.)
* Operation issues - How do you monitor metrics and error logs? How to roll out the system?
* How to handle the next scale curve
* Propose other refinements you need if you have time

Various DO’s and DON’Ts

Do:
* Always ask for clarification
* Understand the requirements of the problem
* There is no right answer. Understand the reqs!
* Let the interviewer know what you’re thinking. Communicate!
* Suggest multiple approaches if possible
* Once you agree with your interviewer on the blueprint, go into details on each component. Design most critical ones first
* Bounce ideas off the interviewer.
* Never give up!

Don’t:
* Don’t be unprepared for typical interview questions
* Don’t jump into a solution without clarifying the reqs
* Don’t go into too much detail on a single component in the beginning. High-level first, then drill down
* If you get stuck, don’t hesitate to ask for hints
* Don’t think in silence, communicate!
* Don’t assume interview is done once you give the design. Ask for feedback early and often

Time allocation on each step:

Step 1 - Understand the problem and scope: 5-7 mins
Step 2 - Propose high-level design and get buy-in: 10-15 mins
Step 3 - Design deep dive: 15-20 mins
Step 4 - Wrap up: 3 mins

19
Q

Rate Limiter

A

Benefits of using one:
* Prevent resource starvation caused by Denial of Service (DoS) attack
* Reduce cost
* Prevent servers from being overloaded

20
Q

Algos for Rate Limiting

A
  • Token bucket
  • Leaking bucket
  • Fixed window counter
  • Sliding window log
  • Sliding window counter
21
Q

Token bucket algorithm

A

Pros:
* Easy to implement
* Memory efficient
* Token bucket allows a burst of traffic for short periods. A request can go through as long as there are tokens left.

Cons:
* Two parameters in the algo are bucket size and token refill rate. However, it might be challenging to tune them properly.

22
Q

Leaking bucket algorithm

A

Similar to Token bucket, except requests are processed at a fixed rate.

Usually implemented with a First in First out queue (FIFO).

Params:
* Bucket size
* Outflow rate: defines how many requests can be processed at a fixed rate, usually in seconds

Pros:
* Memory efficient given the limited queue size
* Requests are processed at a fixed rate therefore it is suitable for use cases that a stable outflow rate is needed

Cons:
* A burst of traffic fills up the queue with old requests, and if they are not processed in time, recent requests will be rate limited
* There are two parameters in the algorithm. It might not be easy to tune them properly

23
Q

Fixed window counter algo

A
  • The algo divides the timeline into fix-sized time windows and assign a counter for each window
  • Each request increments the counter by one
  • Once the counter reaches the pre-defined threshold, new requests are dropped until a new time window starts

Pros:
* Memory efficient
* Easy to understand
* Resetting available quota at the end of a unit time window fits certain use cases

Cons:
* Spike in traffic at edges of a window could cause more requests than allowed quota to go through

24
Q

Sliding window log algo

A

Fixes the edges issue in fixed window algo

  • The algo keeps track of request timestamps. Timestamp data typically kept in cache such as sorted sets of Redis
  • When a new request comes in, remove all the outdated timestamps (Older than the start of the current time window)
  • Add timestamp of the new request to the log
  • If the log size is the same or lower than the allowed count, request is accepted. Otherwise, it is dropped.

Pros:
* Rate limiting implemented by this algo is very accurate. In any rolling window, requests will not exceed the rate limit

Cons:
* The algo consumes a lot of memory because even if a request is rejected, its timestamp might still be stored in memory

25
Q

Sliding window counter algo

A

Hybrid of fixed and log window approaches

Use a mathmatical formula to calculate an estimate of the current amount of requests in a “rolling minute” using a percentage of the previous minute’s request count added to the current minute’s request count

Pros:
* It smooths out spikes in traffic because the rate is based on the average rate of the previous window
* Memory efficient

Cons:
* It only works for not-so-strict look back window. It is an approximation of the actual rate because it assumes requests in the previous window are evenly distributed. However, some evidence suggests this problem is pretty minimal in real cases.

26
Q

High level middleware Rate Limiter

A
  • Client sends a request to the rate limiting middleware
  • Rate limiting middleware fetches the counter form the corresponding bucket in Redis and checks if the limit is reached or not
  • if reached, request is rejected
  • if not, request is sent to API servers. Meanwhile, the system increments the counter and saves it back to Redis
27
Q

How are rate limiting rules created? Where are they stored?

A

Usually written in config files and saved on disk

28
Q

Exceeding the rate limit

A

API returns an HTTP response code 429 (too many requests) to the client.

Depending on the use cases, we may enqueue the rate-limited requests to be processed later. For example processing orders.

29
Q

How does a client know whether it is being throttled? How does it know the number of allowed remaining requests?

A

The answer lies in the HTTP response headers

X-Ratelimit-Remaining: The remaining number of allowed requests within the window.

X-Ratelimit-Limit: It indicates how many calls the client can make per time window.

X-Ratelimit-Retry-After: The number of seconds to wait until you can make a request again without being throttled.

When the user has sent too many requests, the 429 and the X-Ratelimit-Retry-After header are sent back to the client

30
Q

Overall Rate limiter Operations (Detailed Design)

A
  • Rules are stored on disk
  • workers (dedicated servers) frequently pull rules from the disk and store them in the cache
  • When a client sends a request to the server, it is first sent to the rate limiter middleware
  • The middleware loads rules from the cache
  • It also fetches counters and last request timestamp from Redis cache
  • Based on the response, it decides:
    • if the request is not rate limited, forward to API
    • else, return 429 to client
    • limited requests can either be dropped or forwarded to a message queue for later processing
31
Q

Challenges to building Rate Limiter in a distributed environment

A
  • Race conditions
  • Synchronization issues

Race conditions can be resolved by locks, however this will significantly slow down the system.

Two other strategies are used:
* Lua script
* Sorted sets data structure in Redis

Synchronization issue
* sticky sessions that allow a client to send traffic to the same rate limiter. This is not scalable nor flexible tho so bad solution
* Centralized data stores like Redis

32
Q

Performance optimization for Rate Limiter

A
  • Multi-data center setup to reduce latency for users located far away from the data center
  • synchronize data with an eventual consistency model.
33
Q

301 Redirect vs 302 Redirect

A

301:
Shows requested (shortened) URL as “Permanently” moved to the long URL. This means that:
- browser caches the response
- subsequent requests to this url will not be sent to the URL shortening service, but redirected directly to the long url server

302:
“Temporarily” moved to the long URL, and all subsequent requests will still be sent to the URL shortening service first

Tradeoffs:
- If priority is to reduced server load, 301 is best
- If priority is better analytics, 302 as it allows us to track things like click rate and source of click more easily

34
Q

Well known hash functions

A

Hash function Hash value (Hexadecimal)
CRC32 5cb54054
MD5 5a62509a84df9ee03fe1230b9df8b84e
SHA-1 0eeae7916c06853901d9ccbefbfcaf4de57ed85b

35
Q
A