System Design Examples Flashcards

1
Q

How should you track API calls for a rate limiter?

A
  1. Want to have a sliding window counter to track requests
  2. Can implement this by storing in Redis/Memcached the key (user_id + api_key) and for the value a list of key-value pairs that represent the normalized time window
  3. Can rely on evicting that value from the cache within a specified time window
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where can you add caching for an API rate limiter?

A

Using a “write back cache” by updating all counters and timestamps in cache only

  • Then we write to the permanent storage at fixed intervals
  • This way we can ensure minimum latency added to the user’s requests by the rate limiter
  • The reads can always hit the cache first, which will be extremely useful once the user has hit their maximum limit and the rate limiter will not be updating

Least Recently Used can be a reasonable cache eviction policy for this system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Should you rate limit by IP or User ID?

A

Hybrid

For authenticated endpoints, IP is not desirable because multiple users can share a single public IP (like in an internet cafe)

However, if we only rate limit on the user then a malicious actor could rate limit the login API for a user by repeatedly entering the wrong credentials and thus locking out the legitimate user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How would you implement a search index for Twitter search?

A
  1. You likely need to partition our search index
  2. Sharding based on the tweet itself is better for even data distribution and protecting against “hot” search terms
  3. This means you will need to query all the shards, and then aggregate the results
  4. You need to protect against each individual server failure by building a reverse index that will map all the TweetIDs to their index server - which will allow you to rebuild the index in a case of failure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Do you need to add caching to Twitter Search, and if so why and how?

A

The search index tells you which rows are relevant, but you will still need to go fetch the full records of those rows afterwards

To deal with hot tweets we can introduce a cache in front our database (such as memcached)

Application servers, before hitting the backend database can quickly check if the cache has that tweet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where would you add load balancing in Twitter Search?

A
  • Between clients and application servers

- Between application servers and db servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly