Scalability (L12) Flashcards
(18 cards)
What is Scalability in distributed systems?
Scalability is a system’s ability to handle growing amounts of work or users without suffering a drop in performance. A scalable system can expand its resources (like CPU, memory, or network capacity) to keep up with increasing demand.
What are the three dimensions of scalability mentioned in the lecture?
● Size Scalability: Can the system handle more users or data?
● Geographical Scalability: Can the system operate efficiently across large distances?
● Administrative Scalability: Can multiple organizations or teams manage their own parts without conflict?
What are the common root causes for scalability problems in centralized solutions?
● CPU limitations: One machine can only do so much.
● Storage and I/O bottlenecks: Disk or database speed becomes a constraint.
● Network congestion: Too many users talking to one central server overloads the
network.
Name three fundamental techniques for scaling distributed systems.
- Hide communication latencies: Use async requests or local caching to avoid waiting.
- Replication and Caching: Store copies of data closer to users.
-
Modularization and Decomposition: Break the system into smaller, independent parts
that can scale separately.
What is Modularization, and what are its general design principles?
Modularization means breaking a system into well-defined parts (modules), each handling one
responsibility. Key principles:
● Explicit Interfaces: Make dependencies visible.
● Low Coupling: Keep modules loosely connected.
● Small Interfaces: Don’t expose more than necessary.
● High Cohesion: Each module should do one thing well.
Differentiate between “Scaling Up” and “Scaling Out”.
● Scaling Up (Vertical): Upgrade a single machine (e.g., more CPU or RAM).
● Scaling Out (Horizontal): Add more machines or servers to handle the load.
What are the three further “Scaling Out” dimensions discussed?
● Functional Decomposition: Split the app into smaller pieces (e.g., microservices).
● Partitioning (Sharding): Divide the data across servers.
● Duplication: Clone services and balance the load between them.
What is Functional Decomposition in practice?
It’s the process of breaking a monolithic application into independent, focused services (like microservices), which communicate through APIs.
What is Partitioning (Sharding)?
Sharding means splitting a dataset into smaller chunks, stored on different nodes, so no single node is overloaded.
What are two common methods for Partitioning?
● Range Partitioning: Data is split based on sorted key ranges (e.g., A–F, G–M…).
● Hash Partitioning: A hash function spreads data evenly across shards.
What is a Distributed Hash Table (DHT)?
A DHT is a peer-to-peer structure that maps keys to nodes using a consistent hash function. Nodes form a ring, and each one stores a portion of the key-value space. DHTs are scalable and resilient to node changes.
What is Duplication (Multi-tenancy) in the context of scalability?
Duplication means running multiple identical instances of a service to increase capacity. A load balancer distributes incoming requests. Platforms like AWS Auto Scaling automate this based on real-time demand.
What is a Metric in system performance monitoring?
A metric is a numeric value (like requests per second) tracked over time. Metrics often include labels (e.g., service=”auth”) to filter or group them in dashboards.
What are Quality of Service (QoS) attributes?
QoS attributes define how well a system performs from the user’s perspective. Examples:
● Response Time
● Throughput
● Availability
● Reliability
● Security
● Scalability
● Extensibility
What is a Service-Level Indicator (SLI)?
An SLI is a quantitative measurement of how a service behaves. For example, “99.9% of requests responded in under 200ms” is an SLI.
What is a Service-Level Objective (SLO)?
An SLO defines a target for an SLI—what level of performance you aim to maintain. For
example, “Uptime should be at least 99.99%”.
What is the Utilization Law in performance analysis?
It measures how busy a resource is. Formula:
● U = B / T, where B is busy time and T is total time
● Or U = X × S, where X is throughput, and S is service time
How is Throughput (X) calculated?
Throughput is how much work gets done per unit of time. Formula: X = C / T, where C is the number of completions and T is the total time.