System Design Basics Flashcards
Why is sharding/data partitioning is used?
Because sometimes the only viable option in terms of cost and scaling for an application is adding more servers instead of using a more powerful server.
What are the two sharding methods? Quickly explain.
- Horizontal: rows of the same table are stored in different servers
- Vertical: tables of features are stored in different servers (ex. Users, Photos, UserLikes)
What is dictionary based sharding?
It is a technique of extracting the sharding logic to a lookup service. This moves the complexity away from the app, which queries the lookup service to know where to store/get data from.
Cite three sharding/partition criteria.
- Key/hash based: hash function applied to a key of the record yields the partition number
- List based (partition per data characteristic): each partition has a list of values (ex: one partition stores users from Norway, Sweden and Finland).
- Round robin: rows are inserted in partition nodes in order (using for ex: row_id % n)
Cite three challenges of sharding.
- Difficulty of Joins and need for denormalization of data
- Loss of referential integrity enforcement
- Need of rebalancing (re-sharding)
What does the locality of reference principle say?
recently requested data is likely to be requested again
What is a “distributed cache”?
Cache layer is composed of many nodes, each of which stores a piece of the overall cache, in memory.
Usually a consistent hashing is used to determine which node to query for the data.
What are the three main schemes for cache invalidation during write?
- Write through: write both on the cache and the dB (con: higher write latency)
- Write around: write to DB and only evict the cache (con : next read will cache miss)
- Write back: write to cache and it writes to DB async (con: data loss in case of cache failure)
What are indexes used for?
To improve performance of read operations
What is the drawback of using indexes?
All write operations are degraded because you have to write also on the index.
What is a proxy server?
A proxy server is an intermediary piece of hardware/software that sits between the client and the back-end server.
Give 3 uses for a proxy server.
Request logging
Request filtering
Batch several requests into one
What are queues used for?
To enable async communications between systems.
What is redundancy?
Redundancy means duplication of critical data or services with the intention of increased reliability of the system
What does the CAP theorem states?
CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability and Partition tolerance
How is Consistency achieved in a distributed system?
Reads are not allowed until all nodes are updated.
How is Availability achieved in distributed systems?
Data is replicated across multiple servers.
What does partition tolerance means?
Means that a system continues to work despite message loss or partial failure.
How is Partition Tolerance achieved in distributed systems?
Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
What is Consistent Hashing?
It is a technique for hashing that minimizes the impact of adding/removing buckets in the hash table.
How does Consistent Hashing works?
It works by assigning the buckets to the hash values space (imagine a circle from 0 to N, and the bucket are positioned in that circle).
When we hash(key), the result is a place in the circle. The bucket that will be used is the next bucket found by following the circle of values.
What is the disadvantage of ajax polling?
The client has to poll the server at a fixed rate, and many of the responses will be empty, creating an unecessary HTTP overhead.
What is Websocket?
It is a communications protocol that supports full-duplex conversation between the browser and the webserver.
What are Server Sent Events (SSE)?
It is a technology over HTTP that is used to maintain a long-running connection to the server that allows the server to send a stream of messages back to the client.
It does not allow the client to send messages (simplex).
The server responds with a mime-type “text/event-stream”.