Designing Data-Intensive Applications Flashcards
What are the three main goals of a data-intensive application?
Reliability, scalability, and maintainability.
What is reliability in the context of a data-intensive application?
The ability of the application to continue functioning correctly, even in the face of errors, failures, or unexpected inputs.
What is scalability in the context of a data-intensive application?
The ability of the application to handle increased load or demand, by adding more resources or by distributing the workload across multiple machines.
What is maintainability in the context of a data-intensive application?
The ease with which the application can be modified, updated, or fixed over time, without introducing errors or breaking existing functionality.
What are the two main types of storage systems?
Disk-based storage and memory-based storage.
What is disk-based storage?
Disk-based storage uses hard disk drives (HDDs) or solid-state drives (SSDs) to store data persistently on disk.
What is memory-based storage?
Memory-based storage uses volatile memory (RAM) to store data in memory, which is much faster but also more expensive and less durable than disk-based storage.
What are some common disk-based storage systems?
Relational databases, file systems, and key-value stores.
What are some common memory-based storage systems?
In-memory databases, distributed caches, and message brokers.
What are some trade-offs between disk-based and memory-based storage?
Memory-based storage is faster but more expensive and less durable than disk-based storage. Disk-based storage is slower but more affordable and durable.
What is a B-tree?
A B-tree is a type of data structure used for indexing and searching data in disk-based storage systems.
What is a hash index?
A hash index is a type of index that uses a hash function to map keys to addresses in memory or on disk.
What is column-oriented storage?
Column-oriented storage is a method of storing data where each column of a table is stored separately, instead of storing entire rows of a table together.
What are some advantages of column-oriented storage?
Column-oriented storage can be more efficient for certain types of queries, such as range queries or aggregation queries, because it allows the database to read only the columns that are relevant to the query. It can also be more space-efficient because it reduces the amount of data that needs to be read from disk.
What is encoding?
Encoding is the process of representing data in a format that can be stored, transmitted, or processed by a computer.
What is schema evolution?
Schema evolution is the process of changing the structure or format of stored data over time, while preserving the ability to read and write data in both the old and new formats.
What is replication?
Replication is the process of copying data from one node in a distributed system to one or more other nodes, in order to improve availability, performance, and/or durability.
What are some common replication topologies?
Master-slave replication, where one node (the master) receives updates and then replicates them to one or more other nodes (the slaves). Multi-master replication, where multiple nodes can receive updates and then replicate them to other nodes. Leaderless replication, where all nodes are equal and can receive updates independently.
What is a replica?
A replica is a copy of data that has been replicated from one node to another.
What is a quorum?
A quorum is a subset of replicas that must agree on a value or decision, in order for the system to make progress.
What is the CAP theorem?
The CAP theorem states that in a distributed system, it is impossible to simultaneously provide all of the following guarantees: Consistency, Availability, and Partition tolerance.