System Design Flashcards

Question

how can you prevent cascading failures?

Answer 1

overprovisioning - ensure that all the other systems can handle the load of one system when it fails and traffic gets redistributed

Answer 2

about five (5.256)

Answer 3

a Service Level Agreement. What your users can expect from you. For example, five nines of uptiome, or p95 sub-second latency

Answer 4

hadoop distributed file system - an open source self managed distributed file system for big data storage

Answer 5

Single-master designs favor consistency and partition tolerance. Although in principle availability it what's given up, in practice modern NoSQL databases have highly redundant master nodes that can quickly replace themselves in the event of failure.

Answer 6

In the Hadoop Distributed File System, the name node coordinates how files are broken into blocks, and where those blocks are stored. In high availability settings, multiple name nodes may be present for failover.

Answer 7

it can grow dynamically. an array needs to be resized because it is stored sequentially in memory.

Answer 8

stacks and queues

Answer 9

in a doubly linked list you have pointers going in both directions, and you keep track of both head and tail, so you can move in either direction

Answer 10

doubly linked list, since you can move things to the back/front of the list, as well as delete from the back/front in constant time

Answer 11

if you insert every element in order, then you're creating essentially a linked list, since you're only adding item to one side the whole time. the tree is not balanced.

Answer 12

AVL Tree, Red-Black Tree, Splay Tree, B-Tree, 2-3

Answer 13

merge sort

Answer 14

term frequency, inverse document frequency

Answer 15

term frequency / document frequency

Answer 16

it decouples producers and consumers

Answer 17

distributed processing of large amounts of data

Answer 18

yes, with spark streaming connecting to something like kinesis or kafka

Answer 19

yes, it has libraries for doing that

Answer 20

online transaction processing. exposing your data to the outside world, using it, normal database use

Answer 21

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two different types of data processing systems used in organizations for distinct purposes. OLTP systems are designed for transactional processing and real-time data operations. OLAP systems are designed for analytical processing and complex data analysis

Answer 22

blog storage. just storing blobs of data like files. for example on s3.

Answer 23

amazon dynamodb, google bigtable, microsoft cosmosdb / table storage

Answer 24

kubernetes

Answer 25

it is a distributed key value store used by kubernetes to store the cluster state

Answer 26

EMR - elastic map reduce

Answer 27

combining your own data center or servers or private cloud with a public cloud

Answer 28

HTTP PUT: Update/replace existing resource with new representation. HTTP POST: Submit data for processing or create new resource. HTTP PATCH: Partially update existing resource.

Answer 29

The CAP theorem states that in a distributed system, it is impossible to simultaneously guarantee consistency, availability, and partition tolerance. System designers must make trade-offs between these three properties.

Answer 30

Key factors to consider during system design include scalability, availability, reliability, performance, security, maintainability, and cost.

Answer 31

Availability refers to the proportion of time that a system remains operational and accessible to users. It is often measured as a percentage of uptime.

Answer 32

Reliability refers to the ability of a system to consistently perform its intended function without failure or downtime over a specified period.

Answer 33

Performance refers to the speed, throughput, and responsiveness of a system, usually measured in terms of latency, throughput, and concurrency.

Answer 34

Designing a highly available system involves techniques like redundancy, load balancing, failover mechanisms, and distributed architecture to minimize single points of failure and maximize uptime

Answer 35

A relational database is a type of database that organizes data into tables with rows and columns, and establishes relationships between these tables using keys.

Answer 36

Normalization is the process of organizing data in a database to minimize redundancy and dependency. It involves dividing larger tables into smaller, well-structured tables to improve data integrity and efficiency

Answer 37

ACID stands for Atomicity, Consistency, Isolation, and Durability. It is a set of properties that ensure reliable processing and integrity of database transactions

Answer 38

A NoSQL (Not only SQL) database is a type of database that provides a flexible schema and allows for storage and retrieval of unstructured or semi-structured data. It is often used for big data and real-time applications

Answer 39

Atomicity guarantees that a transaction is treated as a single, indivisible unit of work. It ensures that all changes within a transaction are committed or none of them are. If any part of the transaction fails, the entire transaction is rolled back

Answer 40

Consistency ensures that a transaction brings the database from one valid state to another. It enforces any predefined rules or constraints on the data, maintaining data integrity throughout the transaction

Answer 41

Isolation ensures that concurrent transactions do not interfere with each other. It allows transactions to execute as if they were the only ones running, preventing interference such as dirty reads, non-repeatable reads, and phantom reads

Answer 42

Durability guarantees that once a transaction is committed, its changes are permanently saved and will survive any subsequent failures, such as system crashes or power outages. The changes become a permanent part of the database

Answer 43

A document database stores and retrieves semi-structured data in flexible, self-describing formats such as JSON or XML documents. Example: MongoDB, Couchbase, Elasticsearch.

Answer 44

A key-value database stores and retrieves data as a collection of key-value pairs, providing fast and simple storage and retrieval operations. Example: Redis, Amazon DynamoDB, Apache Cassandra.

Answer 45

A columnar database stores data in columns rather than rows, optimizing for efficient read operations and analytics. Example: Apache HBase, Vertica, Apache Parquet.

Answer 46

A graph database models data as nodes, edges, and properties, making it suitable for representing and traversing complex relationships. Example: Neo4j, Amazon Neptune, JanusGraph.

Answer 47

A time-series database specializes in storing and analyzing time-stamped data points, making it ideal for data with a temporal component. Example: InfluxDB, Prometheus, TimescaleDB.

Answer 48

301 Redirect: Permanent redirect indicating a permanent move. 302 Redirect: Temporary redirect indicating a temporary move.

Answer 49

tens of thousands per second

Answer 50

p99 < 50ms

Answer 51

over 250 million reviews

Answer 52

One of the properties of consistent hashing is monotonicity, which says that when the number of shards is increased, keys move only from old shards to new shards (no unnecessary rearrangement)

Answer 53

we have more than billions of events per day, streaming at more than 100 MB per second, and adding up to more than 6 TB per day.

System Design Flashcards

(86 cards)