System Design Flashcards
(101 cards)
7 steps of systems design problems
- Requirements clarification
- Back-of-the-envelope estimation
- System interface definition
- Defining data model
- High-level design
- Detailed design
- Identifying and resolving bottlenecks
Step 1: Requirements Clarification
- Determine exact scope
- Define end goals of system
- Clarify which parts of system to focus on (e.g. back-end vs. front-end)
Step 2: Back-of-the-envelope estimation
- Estimate and quantify scale of system
Step 3: System interface definition
Define what APIs are expected from the system
Step 4: Define data model
- How will data flow between components?
- How will different entities interact with each other?
- How will we partition and manage data? Specific choices:
- Which database system should we use? NoSQL vs SQL?
- What kind of block storage should we use to store files? (e.g. multimedia)
Step 5: High-level design
- Draw a block diagram representing core components to solve the actual problem from end-to-end.
- Possibly describe the system verbally or type out in some kind of list format
Step 6: Detailed design
- Dig deeper in to 2-3 major components, guided by interviewer feedback.
- Consider tradeoffs between different approaches.
Step 7: Identifying and resolving bottlenecks
- Identify any single points of failure and discuss mitigation
- Discuss redundancy and backup plans for data and services
- Discuss performance monitoring
key characteristics of distributed systems
- Scalability
- Reliability
- Availability
- Efficiency
- Serviceability or Manageability
Scalability
capability of a system, process, or network to grow and manage increased demand
Horizontal vs vertical scaling
- Horizontal is easier to scale dynamically by adding machines
- Vertical scaling is upper limited and may involve downtime
- Horizontal scaling examples: Cassandra, MongoDB
- Vertical scaling examples: MySQL
Reliability
- Probably a system will fail in a given period
- Distributed system: keeps delivering services with one or several component failures
- Availability over time
Availability
- Time a system remains operational over a specific period
* Accounts for maintainability and repair
Efficiency
- Latency / response time to requests (correlates to number of messages)
- throughput / bandwidth (correlates to size of messages)
Serviceability / manageability
- Ease to operate and maintain
- Simplicity and speed of repair (as it increases, availability/reliability decrease)
- Considerations: ease of diagnostics, ease of updates
- Automated fault detection
Load balancer
- component to spread traffic across a cluster of servers
- improves responsiveness and availability
- crucial to horizontal scaling
- Ensure health of chosen server
- Select a healthy server
Load balancing placements
- Between client and web server
- between web servers and internal layer (app servers)
- between internal layer and database
Load balancer: least connection method
- directs traffic to server with fewest connections
* good for large number of persistent connections
Load balancer: least response time method
- directs traffic to the server with the lowest response time
Load balancer: least bandwidth method
- selects the server that is currently serving the least amount of traffic
Load balancer: round robin method
- cycles through available servers and sends each request to the next server
- good for servers of equal specs and few persistent requests
Load balancer: weighted round robin method
- round robin but with weights on different servers based upon processing capacity
Load balancer: IP hash method
- client IP address is hashed and servers are each assigned blocks of hashes
Load balancer redundancy
- can be a single point of failure
- can add more LBs to form a cluster of active/passive instances
- clustered LBs monitor each other and passive takes over if active fails