System Design Basic Concepts Flashcards

Question

Three examples of ways you can add to vertically scale

Answer 1

CPU, memory, storage

Answer 2

Replication, sharding

Answer 3

When you copy data from the primary DB to multiple secondary read-only nodes.

Answer 4

When you split data into smaller datasets, which you distribute.

Answer 5

When data in a DB is sharded by-column.

Answer 6

You hash the shard key.

Answer 7

Transmission Control Protocol / Internet Protocol

Answer 8

A model for how data is transmitted around the internet.

Answer 9

Application layer (HTTP) Transport layer (TCP) Internet layer (IP) Network Layer (LAN)

Answer 10

Rules for how data packets are routed across networks.

Answer 11

Rules for how to deal with network unreliability, so data is reliable.

Answer 12

User Datagram Protocol

Answer 13

Protocol for network unreliability handling -- it's a faster version of TCP that is OK with dropping packets?

Answer 14

When you're OK with dropping packets, like for streaming video

Answer 15

Secure Sockets Layer

Answer 16

Transport Layer Security

Answer 17

SSL is deprecated in favor of TLS.

Answer 18

It encrypts connections between the client and server?

Answer 19

It's HTTP but secured by SSL.

Answer 20

Domain Name System

Answer 21

It resolves domain names to IP addresses.

Answer 22

A server relaying traffic to/from the client, and applying logic to that traffic.

Answer 23

A server that accepts requests from clients and forwards them to the server.

Answer 24

Proxy = used by client Reverse Proxy = used by server

Answer 25

Load balancer

Answer 26

Term for deciding how components share information

Answer 27

A system integration strategy where the database is the primary means of sharing information between components.

Answer 28

Representational State Transfer

Answer 29

1: Client and server should be separate and independent 2: Data is represented as resources managed by the server 3: Interfaces support a common set of operations 4: Operations are stateless -- the client leaves no context on the server

Answer 30

An API that follows REST standardsE

Answer 31

1: Client and server are separate and independent 2: HTML, CSS, JS, images, etc. are resources managed by the server 3: GET, POST, PUT, DELETE are an interface with a common set of operations 4: Client leaves no context on the server

Answer 32

Remote Procedure Call

Answer 33

It invokes a routine in a process on some other machine

Answer 34

Interface that handles RPCs by serializing/deserializing and redirecting the call

Answer 35

Interface Design Language

Answer 36

It defines how components communicate via RPCs.

Answer 37

Protocol buffers (protos)

Answer 38

A group of processes running on different machines and communicating through a network

Answer 39

Network unreliability Data inconsistency

Answer 40

Network faults Network congestion Network partitions

Answer 41

When machines are unable to communicate with each other for some reason

Answer 42

When there is too much traffic on the network

Answer 43

When the network is split into two groups of machines that can't communicate with each other

Answer 44

When simultaneous read requests to different nodes are guaranteed to return the same data

Answer 45

When nodes are guaranteed to *eventually* have the same data, albeit not immediately and simultaneously.

Answer 46

Consistency, Availability, Partition Tolerance Consistency really means *strong* consistency

Answer 47

You can't have all three of strong consistency availability partition tolerance

Answer 48

It can't tolerate a partition so it's basically a single-node system

Answer 49

It doesn't have strong consistency, so a partition could cause stale data in nodes.

Answer 50

It doesn't guarantee availability, nodes could be shut down during a partition

Answer 51

This is a measure of availability, the uptime of the network being 99.999%

Answer 52

A group of nodes

Answer 53

When nodes send periodic messages to the coordination service to indicate normal operation

Answer 54

When nodes are automatically added or removed based on traffic

Answer 55

When you have a primary node, and it fails, one of the backup nodes is automatically selected to be the new primary.

Answer 56

First layer of servers a request reaches

Answer 57

Servers the client doesn't directly communicate with

Answer 58

A stateless server that responds to requests from clients (front-end server)

Answer 59

When a web server acts as a single point of access to multiple services, and presents an interface to clients that hides those details

Answer 60

REST => used to communicate with clients RPC => used to communicate with back-end servers

Answer 61

When a web server has a maximum capacity over some time window, and rejects requests exceeding that capacity

Answer 62

When a web server discards or re-routes requests that exceed system capacity

Answer 63

When a web server validates a user's identity

Answer 64

When a web server determines whether a user has permission to access a particular resource

Answer 65

When a downstream server (a dependency of the server being analyzed) isn't able to handle capacity.

Answer 66

When a downstream server that can’t handle capacity lets the web server know “stop sending me stuff”, so that the web server can load shed.

Answer 67

Distributes incoming network traffic across a group of servers.

Answer 68

It can go in front of the web server which has multiple instances, and it can go between the web server and the backend servers.

Answer 69

Layer-4 and Layer-7

Answer 70

Layer-4 only uses network and transport layer data to make routing decisions. Layer-7 also uses HTTP request data.

Answer 71

Temporarily stores a subset of data on a high-speed medium, to improve performance and reduce resource usage.

Answer 72

It is way more expensive and it is volatile.

Answer 73

It can go in any number of places because it’s such a generic concept – you would attach it to a particular service that communicates (directly or indirectly) with a DB, so the service doesn’t always have to communicate with the DB.

Answer 74

Cache located on the same machine as the server.

Answer 75

Co-Located Cache

Answer 76

When data is found (or not found, respectively) in the cache

Answer 77

The strategy for how cache entries are marked as invalid, to be removed.

Answer 78

In a system with multiple caches, those caches are consistent if they have the same values for the same entries.

Answer 79

How you decide what to remove when the cache is full.

Answer 80

Least Recently Used Least Frequently Used First In First Out

Answer 81

How you decide how to handle writes to the cache and the DB

Answer 82

Write-through (write to both cache and DB) Write-back (write only to cache, separate service syncs to DB) Write-around (write only to DB, populate cache on cache miss)

Answer 83

Large volumes of unstructured data, like videos and images

Answer 84

The DB stores metadata about the blobs, and the paths used to access them by-key.

Answer 85

Content Delivery Network

Answer 86

It's a set of geographically-distributed servers where each server copies content from the origin server and distributes it to users.

Answer 87

To get data geographically closer to users by putting content in a "zone" physically close to the user.

Answer 88

The web server tells it to use some specific CDN server.

Answer 89

For large files, like videos.

Answer 90

Logical grouping of READ methods in a Read API

Answer 91

That they are often wholly separate interfaces and services.

Answer 92

Handles one write that triggers multiple writes in multiple destinations.

Answer 93

Push model: Propagate new items as they're created Pull model: Fan-out happens at regular intervals, or on-demand

Answer 94

Globally-unique identifier

Answer 95

It's a strategy for unique IDs where you just generate enormous numbers and pray that there is no collision.

Answer 96

A strategy for unique IDs where you use the time and server ID as the prefix for a GUID, and then guarantee unique suffix within server at time, so the ID is guaranteed to be unique.

Answer 97

A service that generates an ID value guaranteed to be unique, often distributed, with servers syncing IDs with each other.

Answer 98

Stores data from multiple services for analysis.

Answer 99

Stores data in its original raw format – a cheap dumping ground for data uploads.

Answer 100

Map: Send a subset of data to a node that maps input to output Shuffle: Redistribute outputs to get them grouped by-key on same node Reduce: Apply reduce function to keyed group of outputs

Answer 101

Functional testing verifies the correctness of a system on various inputs/outputs Non-functional testing verifies properties other than correctness

Answer 102

Any three of: Regression test A/B test Load test Stress Test Endurance test Security test

Answer 103

Tests the interactions between units of software within a group

Answer 104

Tests the entire system using an external client interacting with the system interface

System Design Basic Concepts Flashcards

(132 cards)