System Design Flashcards Preview

General > System Design > Flashcards

Flashcards in System Design Deck (24):
1

System Design

Steps

Steps

  1. Scope the system
    • Analyze use cases
    • Analyze contraints/capacity estimation
  2. Sketch architecture
  3. Identify bottlenecks (single points of failure)
  4. Analyze scalability

2

Constraits/Capacity Estimation

  • At least two
    • Amount of traffic
    • Amount of data
  • Can start per month, then compute 'per second' 
  • Additional considerations
    • Peak traffic
    • Throughput
    • Geographic distribution of users

3

Use Cases

  • Determine what system should do

Examples

  • What should service do
  • UI/API or both
  • Analytics
  • User base
  • Geographic location
  • Peak traffic/time
  • High availability
  • Sessions

4

Internet Statistics

  • 500m tweets per day
  • 40% of the world (3.7B)
  • 3.5B google search a day
  • 1.3T searches a year

5

Abstract Design

  • Detail the basic components you will need in the system
  • At mimimum, will probably need
    • Application service layer
      • serves the requests
    • Data storage layer

6

vertical scaling

  • if you're running low on processing power (ram, disk space, etc.), get more
  • there's an upper limit with regads to technology (3Ghz)

7

horizontal scaling

  • like a data center
  • buy more machines (usually not state of the art)
  • have to distribute inbound request over all servers

8

load

  • essentially traffic

9

load balancer

  • distributes traffic across various servers when using horizontal scaling
  • load balancer has public ip address, servers it monitors have private addresses
  • sessions are typically saved on individual machines
  • often purchased and organized in pairs, and uses high availability
  • can also take place on DNS level
    • when someone requests abc.com, they get ip of abc.com/city

10

RAID

  • Redundant Array of Independent Disks
  • combines multiple physical disks into a single logical disk for the purposes of redundancy
  • different versions
    • RAID0
    • RAID1
    • RAID5
  • Assume you have multiple hard drives
    • RAID0
      • Have two hard drives
      • "Stripe" data across disks; write a little bit to each disk, and then switch while the other one is still saving
      • Effectively doubles hard drive speed
    • RAID1
      • Mirror between disks
      • Every time you write to one, to write to the other
    • RAID10
      • Has 4 hard drives
      • Combination of RAID1 and RAID5

11

memcached

  • very fast in-memory cache
  • just key, value store (essentially a giant hash table)
  • widely used around the internet
  • sometimes memcache is placed on application servers, and serves as means for them to interact with each other

Steps

  1. Request comes in from client
  2. Check memcache
  3. If there, send back result
  4. If not, check db, write to memcache

12

multi-tiered architecture

  • also called an n-tier architecture
  • client-server setup where application processing and data management are separated

13

database partitioning

  • process of splitting a large table into different smaller tables based on some criterion
  • ex: database a for records A-M, b for records N-Z
  • if queries only need access to a subset of full table, can speed up process
  • with partitionary, you can direct queries to databases based on high-level information (user, geographic location)

14

high availability

  • high uptime
  • setting up server or load balancer whereby they continually send each other heartbeats
  • prevents downtime if one stops working
  • Send each other heartbeats (packets which just indicate that it is working)

15

round robbin

  • algorithm used to balance load
  • sends request to next server in array incrementally
  • loops around when reaching end of list

16

web server

  • respond to http requests with static content
  • respons with html, images, etc.
  • no server-side programming (no business logic)
  • do not generate html dynamically
  • ex: GoDaddy

17

application server

  • servers that handle business logic
  • will generally have a framework installed, like Django or Express
  • ex: Heroku

18

bandwidth

  • a rate
  • maximum amount of data that can hypothetically move through a system per a unit of time (usually a second)
  • measures in Mbit/s
  • 500mbps ethernet cable
  • ie: the size of the highway, size of the pipe

19

throughput

  • a rate
  • similar to bandwidth, but its the actual amount of data that is transmitted per unit time
  • measured in Mbit/s

20

latency

  • amount of time it takes to do one thing
  • generally, amount of itme required to transfer smallest packet to get from point A to point B
  • measured in units of time
  • ex: amount of time required for client's request to reach server

21

redundancy

  • storing data in more than one place

22

scalability

  • database scalability
    • read/write ratio
    • number of objects
    • size of each object
    • relationships between objects
    • database flavor (no-sql vs relational)

23

load and distribution

24

load balancer

  • typically purchased and organized in pairs