high level overviews, storage, tables Flashcards
Code-Deployment System high level overview
our system can actually very simply be divided into 2 clear subsystems:
- Build System that builds code into binaries
- Deployment System that deploys binaries to our machines across the world
Code-Deployment System storage
We’ll use blob storage (Google Cloud Storage or S3) to store our code binaries. Blob storage makes sense here, because binaries are literally blobs of data.
Code-Deployment System table
- jobs table
- id: pk, auto-inc integer
- created_at: timestamp
- commit_sha: string
- name: string, the pointer to the job’s eventual binary in blob storage
- status: string, QUEUED, RUNNING, SUCCEEDED, FAILED
- table for replication status of blobs
AlgoExpert high level overview
We can divide our system into 3 core components:
- Static UI content
- Accessing and interacting with questions (question completion status, saving solutions, etc.)
- Ability to run code
AlgoExpert storage
For the UI static content, we can put public assets like images and JS bundles in a blob store: S3 or GCS. Since we’re catering to a global audience and we care about having a responsive website, we want to use a CDN to serve that content. This is esp important for mobile b/c of the slow connections that phones use.
Static API content, like the list of questions and all solutions, also goes in a blob store for simplicity.
AlgoExpert table
Since this data will have to be queried a lot, a SQL db like Postgres or MySQL seems like a good choice.
Table 1. question_completion_status
- id: pk, auto-inc integer
- user_id
- question_id
- completion_status (enum)
Table 2. user_solutions
- id: pk, auto-inc integer
- user_id
- question_id
- language
- solution
Stockbroker high level overview
- the PlaceTrade API call that clients will make
- the API server(s) handling client API calls
- the system in charge of executing orders for each customer
Stockbroker table
- for trades
- id
- customer_id
- stockTicker
- type: string, either BUY or SELL
- quantity: integer (no fractional shares)
- status: string, the status of the trade; starts as PLACED
- reason: string, the human readable justification of the trade’s status
- created_at: timestamp, the time when the trade was created - for balances
- id, customer_id, amount, last_modified
Amazon high level overview
There’s a USER side and a WAREHOUSE side.
Within a region, user and warehouse requests will get round-robin-load-balanced to respective sets of API servers, and data will be written to and read from a SQL database for that region.
We’ll go with a SQL db because all of the data is, by nature, structured and lends itself well to a relational model.
Amazon table
6 SQL tables
1. items (name, description, price, etc)
2. carts
3. orders
4. aggregated stock (all of the item stocks on Amazon that are relevant to users)
5. warehouse orders
6. warehouse stock (must have physicalStock and availableStock)
FB news feed high level overview
- 2 API calls, CreatePost and GetNewsFeed
- feed creation and storage strategy, then tie everything together
FB news feed storage
We can have one main relational database to store most of our system’s data, including posts and users. This database will have very large tables.
Google Drive high level overview
- we’ll need to support the following operations:
- files: upload, download, delete, rename, move
- folders: create, get, rename, delete, move - design storage solution for
- entity (files and folders) metadata
- file content
Google Drive storage
To store entity info, we use K-V stores. Since we need high availability and data replication, we need to use something like Etcd, Zookeeper, or Google Cloud Spanner (as a K-V store) that gives us both of those guarantees as well as consistency (as opposed to DynamoDB, for instance, which would give us only eventual consistency).
To store file chunks, GCS.
To store blob reference counts, SQL table.
Google Drive table
for files and folders.
both have:
id, is_folder (t/f), name, owner_id, parent_id
difference: files have blobs (array of blob hashes); folders have children (array of children IDs)
Netflix high level overview
- Storage (Video Content, Static Content, and User Metadata)
- General Client-Server Interaction (i.e., the life of a query)
- Video Content Delivery
- User-Activity Data Processing
Netflix storage
- Since we’re only dealing with a few hundred terabytes of video content, we can use a simple blob storage solution like S3 or GCS.
- Static content (titles, cast lists, descriptions) in a relational db or even in a document store, and we can cache most of it in our API servers.
- User metadata in a classic relational db like Postgres.
- User activity logs in HDFS
Tinder high level overview
- Overview
- Profile Creation
- Deck Generation
- Swiping
maybe super-liking and undoing
Tinder storage
Most of the data that we expect to store (profiles, decks, swipes, matches), is structured, so we’ll use SQL. All of this data will be stored in regional dbs, located based on user hot spots. We’ll have asynchronous replication btwn the regional dbs.
The only exception is users’ profile pics, which goes in a global blob store and will be served via CDN.
Tinder table
all SQL.
1. profiles (each row is a profile)
2. users’ matches (each row is one user’s deck)
3. swipes
4. matches
Slack high level overview
2 main sections:
- Handling what happens when a Slack app loads.
- Handling real-time messaging as well as cross-device synchronization.
Slack storage
Since our tables will be very large, esp the messages table, we must shard. We can use a “smart” sharding solution; this service can be a strongly consistent key-value store like Etcd or ZooKeeper, mapping orgIds to shards
Slack table
SQL db since we’re dealing w/ structured data that’s queried frequently. tables for:
1. channels
2. channel members (each row is a channel-member pair)
3. Historical Messages
4. Latest Channel Timestamps
5. Channel Read Receipts (lastSeen is updated whenever a user opens a channel)
6. number of unread user mentions
AirBNB high level overview
HOST side and RENTER side. We further divide the renter side:
- Browsing listings.
- Getting a single listing.
- Reserving a listing.