System Architecture Flashcards
(46 cards)
Load Balancing
Static:
- Round robin
- Weighted round robin
- IP hash
Dynamic:
- Least connections
- Weighted least connections (assign different weights to each server)
- Weighted response time
- Resource-based (CPU, memory)
Global server load balancing (CDN)
Estimate set membership
Bloom filter
- Elements are added to the filter by hashing them multiple times and setting the bits at the resulting indices in a bit array.
- To check if an element is present, it’s hashed in the same way, and all corresponding bits in the array are checked.
- If all the checked bits are set, the element might be in the set (potential false positive); if any bit is not set, the element is definitely not in the set (no false negatives).
Estimate set cardinality
HyperLogLog
- HyperLogLog estimates the number of unique elements in a dataset by observing the longest run of leading zeros in the binary representation of hashed elements.
- It uses multiple independent hash functions and divides the hashed values into streams to reduce variance and improve accuracy.
- The final cardinality estimate is obtained by combining the estimates from each stream using a harmonic mean, with bias correction applied for small and large cardinalities.
Estimate counters
Count-min sketch
- It uses multiple hash functions to map each item to a set of counters in a 2D array. When an item’s count is incremented, the corresponding counters in each row are increased.
- To estimate the count of an item, the sketch hashes the item using the same functions and retrieves the values of the corresponding counters.
- The minimum of these retrieved counter values provides an estimate of the item’s frequency. This estimate is guaranteed to be greater than or equal to the true frequency (no underestimation), but may overestimate it.
Consistent hashing
- Distributes data across a dynamic number of servers (nodes) with minimal disruption. Unlike traditional hashing, when a node is added or removed, only a small fraction of the keys need to be remapped.
- Uses a hash ring. Both the keys and the servers are hashed onto a circular space (the “ring”). Each key is assigned to the next server in the ring in a clockwise direction.
- Improves scalability and fault tolerance. By minimizing data movement during scaling events and node failures, consistent hashing reduces the load on the system and improves its resilience.
Design bit.ly
- Read / write patterns strongly favor reads. Highly cacheable.
- Expiration time on URLs
- Validate URL is valid on creation (using a library like is-url)
- base62 encoding (not 64)
- 302 (temporary redirect) to enable analytics and give us more control. 301s cache in browsers and bypass the server.
- Redis caching for heavy reads (1,000x faster than SSD reads, millions of reads per second) – cache invalidation can be complex
- CDN / edge computing: Points of Presence (PoPs). Can avoid having to back to origin server. Cache invalidation is still a problem. Potentially higher costs. Debugging becomes more challenging.
- Break out service to reads and writes to scale separately.
- Redis for incrementing ID, could use counter batching to reduce overhead if needed.
Common functional requirements
- Authentication / authorization
- Observability and alerting
- The system should be well-tested and easy to deploy (CI/CD pipelines).
- The system should be fault-tolerant.
- The system should have regular backups.
Hash indexing
Useful of exact match queries. O(1) average case lookup. Supported in PostgreSQL.
Data access times
- Memory: 100 nanoseconds (0.0001 ms)
- SSD: 0.1 ms
- HDD: 10 ms
CDN
Content delivery network
- Points of Presence (PoPs) geographically distributed around the world
- Can avoid having to back to origin server.
Potential problems:
- Cache invalidation is still a problem.
- Potentially higher costs.
- Debugging becomes more challenging.
DB resilience (psotgres)
- Replicas (hot backup, synchronous in same region, async across regions)
- DB backups
Design LeetCode
- Monaco online editor library for syntax highlighting / autocomplete
- Security and isolation when running user code (containers vs serverless). Containers, need to manage these explicitly. Not as secure as VMs. Need to take care to configure and secure the containers to prevent users from breaking out and accessing the host system. Enforce resource limits.
- Read Only Filesystem
- CPU and Memory Bounds
- Explicit Timeout
- Limit Network Access
- No System Calls (Seccomp)
- Serverless: cold start times are a problem, may have resource limits that can negatively impact performance.
- Live leaderboards / competetions!
- Only 4k problems, very small scale
- Leaderboard: redis sorted set, client polls every 5 seconds for updates (likely don’t need Websockets)
- Auto scaling (ECS: elastic container service)
- Horizontal scaling with queues: buffer submissions during peak times. Enables retries if a container fails.
REST API returning a list
Make sure you paginate! ?page=1&size=100
Secure running user code
- Read-only filesystem: write output to a temp dir
- CPU and memory bounds
- Explicit timeout: kill process if it exceeds timeout
- Limit network access: disable it completely from the container
- No system calls (seccomp - secure computing) - a mode of the kernel
Design a leaderboard:
- global top 10
- friends top 10
- relative position global
- relative position friends
Auto scaling containers
ECS Elastic Container Service
Video / audio concepts
- Codecs: encoders and decoders. Compress images, audio and video.
- Video container: MPEG, etc. A video container is a file format that stores video data (frames, audio) and metadata. A container might house information like video transcripts as well.
- Bitrate: The bitrate of a video is the number of bits transmitted over a period of time, typically measured in kilobytes per second (kbps) or megabytes per second (mbps).
- Manifest files: Manifest files are text-based documents that give details about video streams. There’s typically 2 types of manifest files: primary and media files.
Design Youtube
- Presigned URLs -> AWS S3
- Store different video formats as segments. Segments must be playable on their own. They are created at i-frame (key frame) boundaries.
- Users watch videos using adaptive bitrate streaming. Client does more work to determine the bitrate it can handle.
- Tools like ffmpeg are used to segment and encode
- A data pipeline DAG is used to process videos efficiently (highly parallel). Can use something like Temporal here for orchestration. Can use S3 to pass temporary data between stages of the pipeline.
- Resumable uploads: split video into chunks on the client side. Store metadata and track status of uploads per chunk. Use S3 event notifications to track chunk upload completions.
- CDN: for video files and manifests
- Speed up uploads: “pipeline” the process by uploading segments of the video and start processing them immediately while other segments are still uploading.
Design Ticketmaster
- Availability for viewing. Consistency for booking.
- Entities: Event, User, Performer, Venue, Ticket, Booking
- Booking endpoints: 2 endpoints, (1) reserve tickets, (2) book tickets. ACID properties necessary.
- Booking service: uses Stripe for payment processing
- Reservations: don’t lock in DB, status with expiration is okay, distributed locking with Redis is better
- High demand events?: SSE for real-time seat updates help. Virtual waiting queue is best.
- Search indexing: Index + SQL query optimization «_space;Full-text indexes (via extensions) in the DB == Elasticsearch
- Search caching: Redis (keys based on search queries) «_space;Query result caching and edge caching (CDNs)
Circuit Breaker implementations
Hystrix is a circuit breaker that stops cascading failures, realtime monitoring for configuration changes, concurrency aware request caching and automated batching through request collapsing. If microservice failed to return the default response, wait until it’s recovered.
Row-level locking vs Optimistic Concurrency Control (OCC)
Redis Sentinel Architecture
Redis distributed lock
- SETNX (set if not exists) with TTL
- Redlock algo (set lock on multiple nodes with quorum)
- Libraries: Redsync (Redlock impl in golang), Redisson
Lucene architecture