System Design Basic Concepts Flashcards

(132 cards)

1
Q

Four main things to care about with system design

A

Performance, Scalability, Availability, Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we call the hardware that runs a system?

A

Machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do we call the hardware a client uses?

A

Device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between a “service” and a “server”?

A

A server is an instance of a binary that provides many services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does API stand for?

A

Application Programming Interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an API?

A

It defines how programs interact with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a good mental test for scalability?

A

How well could the program behave if capacity was increased ten-fold?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In what context would you use an Entity-Relationship Diagram?

A

Designing a database schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do you call the set of rules used for an Entity-Relationship Diagram?

A

Unified Modeling Language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does UML stand for?

A

Unified Modeling Language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is monolithic design?

A

When all software is built and deployed as a single unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are “microservices”?

A

When software is represented as a collection of independent services that communicate with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “loose coupling” vs. “tight coupling”?

A

Loose coupling is when different components and services have minimal dependencies on each other.

Tight coupling is when they’re highly-dependent on each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is “high cohesion” vs. “low cohesion”?

A

High cohesion is when the logic, methods and classes of a single service are functionally related.

Low cohesion is when the service does a lot of overlapping things and has a vague role.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is high cohesion good? Four things.

A

It’s easier to maintain, deploy, test and understand?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two models of interservice interaction?

A

Orchestration and Choreography

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is orchestration?

A

A model of interservice interaction one service is the “orchestrator” and manages communication between services.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is choreography?

A

A model of interservice interaction where an event stream holds events and each service may produce events or subscribe (listen to) certain events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is persistence?

A

After data is written to a DB, it is stably stored on non-volatile memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Volatile vs. non-volatile memory

A

Volatile memory is erased when it is powered off, like RAM. Non-volatile is like a hard drive, which maintains data even when powered off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does NoSQL stand for?

A

Not Only SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is NoSQL?

A

Catch-all term for DBs that don’t store data in tables, such as a key-value collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a normalized database vs. a de-normalized one?

A

In a normalized database, data is isolated and non-redundant.

A de-normalized database is when data from one table is copied to be part of another. An example is when you have a FrequencyCap table, and a FrequencyCap field on Campaign.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Vertical vs. Horizontal Scaling

A

Vertical scaling: Increase the resources of a single machine

Horizontal scaling: Adding more machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Three examples of ways you can add to vertically scale
CPU, memory, storage
26
Two examples of ways to horizontally scale a database
Replication, sharding
27
Replication (in the context of a database)
When you copy data from the primary DB to multiple secondary read-only nodes.
28
Sharding (in the context of a database)
When you split data into smaller datasets, which you distribute.
29
Vertical partitioning (in the context of a database)
When data in a DB is sharded by-column.
30
How do you avoid imbalanced data when sharding a database? That is, one shard having a lot more data than the others.
You hash the shard key.
31
What does TCP/IP stand for?
Transmission Control Protocol / Internet Protocol
32
What is TCP/IP?
A model for how data is transmitted around the internet.
33
What are the four layers of TCP/IP?
Application layer (HTTP) Transport layer (TCP) Internet layer (IP) Network Layer (LAN)
34
Internet Protocol
Rules for how data packets are routed across networks.
35
Transmission Control Protocol
Rules for how to deal with network unreliability, so data is reliable.
36
What does UDP stand for?
User Datagram Protocol
37
What is UDP?
Protocol for network unreliability handling -- it's a faster version of TCP that is OK with dropping packets?
38
In what situation might you want to use UDP over TCP?
When you're OK with dropping packets, like for streaming video
39
What does SSL stand for?
Secure Sockets Layer
40
What does TLS stand for?
Transport Layer Security
41
What's the relationship between SSL and TLS?
SSL is deprecated in favor of TLS.
42
What is SSL?
It encrypts connections between the client and server?
43
What is HTTPS?
It's HTTP but secured by SSL.
44
What does DNS stand for?
Domain Name System
45
What does DNS do?
It resolves domain names to IP addresses.
46
Proxy
A server relaying traffic to/from the client, and applying logic to that traffic.
47
Reverse Proxy
A server that accepts requests from clients and forwards them to the server.
48
What's the difference between a proxy and a reverse proxy?
Proxy = used by client Reverse Proxy = used by server
49
What is the most common use case for a reverse proxy?
Load balancer
50
System Integration
Term for deciding how components share information
51
Database Integration
A system integration strategy where the database is the primary means of sharing information between components.
52
What does REST stand for?
Representational State Transfer
53
What are the four REST standards?
1: Client and server should be separate and independent 2: Data is represented as resources managed by the server 3: Interfaces support a common set of operations 4: Operations are stateless -- the client leaves no context on the server
54
RESTful API
An API that follows REST standardsE
55
Explain how HTTP follows REST standards
1: Client and server are separate and independent 2: HTML, CSS, JS, images, etc. are resources managed by the server 3: GET, POST, PUT, DELETE are an interface with a common set of operations 4: Client leaves no context on the server
56
What does RPC stand for
Remote Procedure Call
57
What is an RPC
It invokes a routine in a process on some other machine
58
Stub
Interface that handles RPCs by serializing/deserializing and redirecting the call
59
What does IDL stand for?
Interface Design Language
60
What is an IDL for?
It defines how components communicate via RPCs.
61
What's a Google example of an IDL?
Protocol buffers (protos)
62
Distributed System
A group of processes running on different machines and communicating through a network
63
What are the two key problems you have to solve with a distributed system
Network unreliability Data inconsistency
64
What are three causes of network unreliability?
Network faults Network congestion Network partitions
65
Network fault
When machines are unable to communicate with each other for some reason
66
Network congestion
When there is too much traffic on the network
67
Network partition
When the network is split into two groups of machines that can't communicate with each other
68
Strong consistency
When simultaneous read requests to different nodes are guaranteed to return the same data
69
Eventual consistency
When nodes are guaranteed to *eventually* have the same data, albeit not immediately and simultaneously.
70
What does CAP stand for?
Consistency, Availability, Partition Tolerance Consistency really means *strong* consistency
71
What is the CAP Theorem?
You can't have all three of strong consistency availability partition tolerance
72
Problem with a CA system
It can't tolerate a partition so it's basically a single-node system
73
Problem with an AP system
It doesn't have strong consistency, so a partition could cause stale data in nodes.
74
Problem with a CP system
It doesn't guarantee availability, nodes could be shut down during a partition
75
Five 9s
This is a measure of availability, the uptime of the network being 99.999%
76
Cluster
A group of nodes
77
Heartbeat
When nodes send periodic messages to the coordination service to indicate normal operation
78
Autoscaling
When nodes are automatically added or removed based on traffic
79
Leader Election
When you have a primary node, and it fails, one of the backup nodes is automatically selected to be the new primary.
80
Where should you maintain the metadata for a distributed system?
Database
81
Front-end server
First layer of servers a request reaches
82
Back-end server
Servers the client doesn't directly communicate with
83
Web server
A stateless server that responds to requests from clients (front-end server)
84
API Gateway
When a web server acts as a single point of access to multiple services, and presents an interface to clients that hides those details
85
How does web server relate to RPC/REST
REST => used to communicate with clients RPC => used to communicate with back-end servers
86
Throttling
When a web server has a maximum capacity over some time window, and rejects requests exceeding that capacity
87
Load shedding
When a web server discards or re-routes requests that exceed system capacity
88
Authentication
When a web server validates a user's identity
89
Authorization
When a web server determines whether a user has permission to access a particular resource
90
Degraded dependency
When a downstream server (a dependency of the server being analyzed) isn't able to handle capacity.
91
Pushback
When a downstream server that can’t handle capacity lets the web server know “stop sending me stuff”, so that the web server can load shed.
92
Load Balancer
Distributes incoming network traffic across a group of servers.
93
What are two places in the system a load balancer might go?
It can go in front of the web server which has multiple instances, and it can go between the web server and the backend servers.
94
Two types of load balancers
Layer-4 and Layer-7
95
Layer-4 vs. Layer-7 load balancer
Layer-4 only uses network and transport layer data to make routing decisions. Layer-7 also uses HTTP request data.
96
Cache
Temporarily stores a subset of data on a high-speed medium, to improve performance and reduce resource usage.
97
How much better performance does RAM get vs. SSD?
20x
98
How much better performance does RAM get vs. HDD?
80x
99
Why not always use RAM vs. SSD/HDD?
It is way more expensive and it is volatile.
100
Where in a system would you put a cache?
It can go in any number of places because it’s such a generic concept – you would attach it to a particular service that communicates (directly or indirectly) with a DB, so the service doesn’t always have to communicate with the DB.
101
Local Cache
Cache located on the same machine as the server.
102
What is a Local Cache also known as?
Co-Located Cache
103
Cache Hit / Miss
When data is found (or not found, respectively) in the cache
104
Cache Invalidation
The strategy for how cache entries are marked as invalid, to be removed.
105
Most common strategy for cache invalidation
TTL
106
Cache consistency
In a system with multiple caches, those caches are consistent if they have the same values for the same entries.
107
Cache replacement policy
How you decide what to remove when the cache is full.
108
Three most popular cache replacement policies
Least Recently Used Least Frequently Used First In First Out
109
Cache write policy
How you decide how to handle writes to the cache and the DB
110
Three cache write policies and what they are?
Write-through (write to both cache and DB) Write-back (write only to cache, separate service syncs to DB) Write-around (write only to DB, populate cache on cache miss)
111
Blob Storage
Large volumes of unstructured data, like videos and images
112
If you have blob storage, how do you still use the DB?
The DB stores metadata about the blobs, and the paths used to access them by-key.
113
What does CDN stand for?
Content Delivery Network
114
What is a Content Delivery Network?
It's a set of geographically-distributed servers where each server copies content from the origin server and distributes it to users.
115
What is the purpose of a CDN?
To get data geographically closer to users by putting content in a "zone" physically close to the user.
116
How does the client know to get data from the CDN?
The web server tells it to use some specific CDN server.
117
What is a common example of when a CDN is used?
For large files, like videos.
118
Facade
Logical grouping of READ methods in a Read API
119
What is the main thing to know about Read vs. Write APIs?
That they are often wholly separate interfaces and services.
120
Fan-out service
Handles one write that triggers multiple writes in multiple destinations.
121
Two models of fan-out service
Push model: Propagate new items as they're created Pull model: Fan-out happens at regular intervals, or on-demand
122
What does GUID stand for
Globally-unique identifier
123
What is GUID
It's a strategy for unique IDs where you just generate enormous numbers and pray that there is no collision.
124
Snowflake
A strategy for unique IDs where you use the time and server ID as the prefix for a GUID, and then guarantee unique suffix within server at time, so the ID is guaranteed to be unique.
125
Unique ID service
A service that generates an ID value guaranteed to be unique, often distributed, with servers syncing IDs with each other.
126
Data warehouse
Stores data from multiple services for analysis.
127
Data lake
Stores data in its original raw format – a cheap dumping ground for data uploads.
128
Steps of Map-Reduce:
Map: Send a subset of data to a node that maps input to output Shuffle: Redistribute outputs to get them grouped by-key on same node Reduce: Apply reduce function to keyed group of outputs
129
Functional vs. Non-Functional Testing
Functional testing verifies the correctness of a system on various inputs/outputs Non-functional testing verifies properties other than correctness
130
There are many examples of non-functional tests, give three.
Any three of: Regression test A/B test Load test Stress Test Endurance test Security test
131
Integration test
Tests the interactions between units of software within a group
132
End-to-end test
Tests the entire system using an external client interacting with the system interface