Priority 1 Flashcards

(59 cards)

1
Q

SSL Certificate

A

A digital certificate granted to a server by acertificate authority. Contains the server’s public key, to be used as part of theTLS handshakeprocess in anHTTPSconnection.

An SSL certificate effectively confirms that a public key belongs to the server claiming it belongs to them. SSL certificates are a crucial defense againstman-in-the-middle attacks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Streaming

A

In networking, it usually refers to the act of continuously getting a feed of information from a server by keeping an open connection between the two machines or processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SHA

A

Short for “Secure Hash Algorithms”, the SHA is a collection of cryptographic hash functions used in the industry. These days, SHA-3 is a popular choice to use in a system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

YAML

A

A file format mostly used in configuration. Example:

version: 1.0
name: AlgoExpert Configuration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relational Database

A

A type of structured database in which data is stored following a tabular format; often supports powerful querying using SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SQL Database

A

Any database that supports SQL. This term is often used synonymously with “Relational Database”, though in practice, noteveryrelational database supports SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Client—Server Model

A

The paradigm by which modern systems are designed, which consists of clients requesting data or service from servers and servers providing data or service to clients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Content Delivery Network

A

ACDNis a third-party service that acts like a cache for your servers. Sometimes, web applications can be slow for users in a particular region if your servers are located only in another region. A CDN has servers all around the world, meaning that the latency to a CDN’s servers will almost always be far better than the latency to your servers. A CDN’s servers are often referred to asPoPs(Points of Presence). Two of the most popular CDNs areCloudflareandGoogle Cloud CDN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Memory

A

Short forRandom Access Memory (RAM). Data stored in memory will belostwhen the process that has written that data dies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Monitoring

A

The process of having visibility into a system’s key metrics, monitoring is typically implemented by collecting important events in a system and aggregating them in human-readable charts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Virtual Machine

A

AVMis a form of computer inside of a computer. It is a program that you run on a machine that completely emulates a new kernel and operating system. Very useful when isolating programs from one another while having them share the same physical machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Redundancy

A

The process of replicating parts of a system in an effort to make it more reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Socket

A

A kind of file that acts like a stream. Processes can read and write to sockets and communicate in this manner. Most of the time the sockets are fronts for TCP connection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DNS

A

Short for Domain Name System, it describes the entities and protocols involved in the translation from domain names to IP Addresses. Typically, machines make a DNS query to a well known entity which is responsible for returning the IP address (or multiple ones) of the requested domain name in the response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Polling

A

The act of fetching a resource or piece of data regularly at an interval to make sure your data is not too stale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Client

A

A machine or process that requests data or service from a server.

Note that a single machine or piece of software can be both a client and a server at the same time. For instance, a single machine could act as a server for end users and as a client for a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

JSON

A

A file format heavily used in APIs and configuration. Stands forJavaScriptObjectNotation. Example:

{
   "version": 1.0,
   "name": "AlgoExpert Configuration"
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

IP Packet

A

Sometimes more broadly referred to as just a (network)packet, an IP packet is effectively the smallest unit used to describe data being sent overIP, aside from bytes. An IP packet consists of:

  • anIP header, which contains the source and destinationIP addressesas well as other information related to the network
  • apayload, which is just the data being sent over the network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

HTTP

A

TheHyperTextTransferProtocol is a very common network protocol implemented on top of TCP. Clients make HTTP requests, and servers respond with a response.

Requests typically have the following schema:

host: string (example: algoexpert.io)
port: integer (example: 80 or 443)
method: string (example: GET, PUT, POST, DELETE, OPTIONS or PATCH)
headers:  pair list (example: "Content-Type" => "application/json")
body: opaque sequence of bytes

Responses typically have the following schema:

status code: integer (example: 200, 401)
headers:  pair list (example: "Content-Length" => 1238)
body: opaque sequence of bytes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Process

A

A program that is currently running on a machine. You should always assume that any process may get terminated at any time in a sufficiently large system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Socket

A

A kind of file that acts like a stream. Processes can read and write to sockets and communicate in this manner. Most of the time the sockets are fronts for TCP connection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Databases

A

Databases are programs that either use disk or memory to do 2 core things:recorddata andquerydata. In general, they are themselves servers that are long lived and interact with the rest of your application through network calls, with protocols on top of TCP or even HTTP.

Some databases only keep records in memory, and the users of such databases are aware of the fact that those records may be lost forever if the machine or process dies.

For the most part though, databases need persistence of those records, and thus cannot use memory. This means that you have to write your data to disk. Anything written to disk will remain through power loss or network partitions, so that’s what is used to keep permanent records.

Since machines die often in a large scale system, special disk partitions or volumes are used by the database processes, and those volumes can get recovered even if the machine were to go down permanently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Throughput

A

The number of operations that a system can handle properly per time unit. For instance the throughput of a server can often be measured in requests per second (RPS or QPS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Port

A

In order for multiple programs to listen for new network connections on the same machine without colliding, they pick aportto listen on. A port is an integer between 0 and 65,535 (216ports total).

Typically, ports 0-1023 are reserved forsystem ports(also calledwell-knownports) and shouldn’t be used by user-level processes. Certain ports have pre-defined uses, and although you usually won’t be required to have them memorized, they can sometimes come in handy. Below are some examples:

  • 22: Secure Shell
  • 53: DNS lookup
  • 80: HTTP
  • 443: HTTPS
25
Node/Instance/Host
These three terms refer to the same thing most of the time: a virtual or physical machine on which the developer runs processes. Sometimes the word server also refers to this same concept.
26
Spatial Database
A type of database optimized for storing and querying spatial data like locations on a map. Spatial databases rely on spatial indexes like quadtrees to quickly perform spatial queries like finding all locations in the vicinity of a region.
27
Percentiles
Most often used when describing a latency distribution. If your Xth percentile is 100 milliseconds, it means that X% of the requests have latencies of 100ms or less. Sometimes, SLAs describe their guarantees using these percentiles.
28
IP Address
An address given to each machine connected to the public internet. IPv4 addresses consist of four numbers separated by dots: **a.b.c.d** where all four numbers are between 0 and 255. Special values include: - **127.0.0.1**: Your own local machine. Also referred to as **localhost**. - **192.168.x.y**: Your private network. For instance, your machine and all machines on your private wifi network will usually have the **192.168** prefix.
29
Hashing Function
A function that takes in a specific data type (such as a string or an identifier) and outputs a number. Different inputs may have the same output, but a good hashing function attempts to minimize those hashing collisions (which is equivalent to maximizing uniformity).
30
SQL
Structured Query Language. Relational databases can be used using a derivative of SQL such as PostgreSQL in the case of Postgres.
31
SHA
Short for "Secure Hash Algorithms", the SHA is a collection of cryptographic hash functions used in the industry. These days, SHA-3 is a popular choice to use in a system.
32
Configuration
A set of parameters or constants that are critical to a system. Configuration is typically written in **JSON** or **YAML** and can be either **static**, meaning that it's hard-coded in and shipped with your system's application code (like frontend code, for instance), or **dynamic**, meaning that it lives outside of your system's application code.
33
Pagination
When a network request potentially warrants a really large response, the relevant API might be designed to return only a single **page** of that response (i.e., a limited portion of the response), accompanied by an identifier or token for the client to request the next page if desired. Pagination is often used when designing **List** endpoints. For instance, an endpoint to list videos on the YouTube Trending page could return a huge list of videos. This wouldn't perform very well on mobile devices due to the lower network speeds and simply wouldn't be optimal, since most users will only ever scroll through the first ten or twenty videos. So, the API could be designed to respond with only the first few videos of that list; in this case, we would say that the API response is **paginated**.
34
CRUD Operations
Stands for Create, Read, Update, Delete Operations. These four operations often serve as the bedrock of a functioning system and therefore find themselves at the core of many APIs. The term CRUD is very likely to come up during an API-design interview.
35
Availability Zone
Sometimes referred to as an AZ, an availability zone designates a group of machines that share one or more central system components (e.g., power source, network connectivity, machine-cooling system). Availability zones are typically located far away from each other such that no natural disaster can realistically bring down two of them at once. This ensures that if you have redundant storage, for instance, with data stored in two availability zones, losing one AZ still leaves you with an operational system that abides by any SLA that it might have.
36
Microservice Architecture
When a system is made up of many small web services that can be compiled and deployed independently. This is usually thought of as a counterpart of monoliths.
37
Non-Relational Database
In contrast with relational database (SQL databases), a type of database that is free of imposed, tabular-like structure. Non-relational databases are often referred to as NoSQL databases.
38
Peer-To-Peer Network
A collection of machines referred to as peers that divide a workload between themselves to presumably complete the workload faster than would otherwise be possible. Peer-to-peer networks are often used in file-distribution systems.
39
Load Balancer
A type of reverse proxy that distributes traffic across servers. Load balancers can be found in many parts of a system, from the DNS layer all the way to the database layer.
40
Spatial Database
A type of database optimized for storing and querying spatial data like locations on a map. Spatial databases rely on spatial indexes like quadtrees to quickly perform spatial queries like finding all locations in the vicinity of a region.
41
NoSQL Database
Any database that is not SQL-compatible is called NoSQL.
42
DoS Attack
Short for "denial-of-service attack", a DoS attack is an attack in which a malicious user tries to bring down or damage a system in order to render it unavailable to users. Much of the time, it consists of flooding it with traffic. Some DoS attacks are easily preventable with rate limiting, while others can be far trickier to defend against.
43
CAP Theorem
Stands for **Consistency**, **Availability**, **Partition tolerance**. In a nutshell, this theorem states that any distributed system can only achieve 2 of these 3 properties. Furthermore, since almost all useful systems do have network-partition tolerance, it's generally boiled down to: Consistency vs. Availability; pick one. One thing to keep in mind is that some levels of consistency are still achievable with high availability, but strong consistency is much harder.
44
Replication
The act of duplicating the data from one database server to others. This is sometimes used to increase the redundancy of your system and tolerate regional failures for instance. Other times you can use replication to move data closer to your clients, thus decreasing the latency of accessing specific data.
45
HTTPS
The HyperText Transfer Protocol Secure is an extension of HTTP that's used for secure communication online. It requires servers to have trusted certificates (usually SSL certificates) and uses the Transport Layer Security (TLS), a security protocol built on top of TCP, to encrypt data communicated between a client and a server.
46
Key-Value Store
A Key-Value Store is a flexible NoSQL database that's often used for caching and dynamic configuration. Popular options include DynamoDB, Etcd, Redis, and ZooKeeper.
47
Cache
A piece of hardware or software that stores data, typically meant to retrieve that data faster than otherwise. Caches are often used to store responses to network requests as well as results of computationally-long operations. Note that data in a cache can become **stale** if the main source of truth for that data (i.e., the main database behind the cache) gets updated and the cache doesn't.
48
Blob Storage
Widely used kind of storage, in small and large scale systems. They don’t really count as databases per se, partially because they only allow the user to store and retrieve data based on the name of the blob. This is sort of like a key-value store but usually blob stores have different guarantees. They might be slower than KV stores but values can be megabytes large (or sometimes gigabytes large). Usually people use this to store things like **large binaries, database snapshots, or images** and other static assets that a website might have. Blob storage is rather complicated to have on premise, and only giant companies like Google and Amazon have infrastructure that supports it. So usually in the context of System Design interviews you can assume that you will be able to use **GCS** or **S3**. These are blob storage services hosted by Google and Amazon respectively, that cost money depending on how much storage you use and how often you store and retrieve blobs from that storage.
49
Persistent Storage
Usually refers to disk, but in general it is any form of storage that persists if the process in charge of managing it dies.
50
Latency
The time it takes for a certain operation to complete in a system. Most often this measure is a time duration, like milliseconds or seconds. You should know these orders of magnitude: - **Reading 1 MB from RAM**: 250 μs (0.25 ms) - **Reading 1 MB from SSD**: 1,000 μs (1 ms) - **Transfer 1 MB over Network**: 10,000 μs (10 ms) - **Reading 1MB from HDD**: 20,000 μs (20 ms) - **Inter-Continental Round Trip**: 150,000 μs (150 ms)
51
Server
A machine or process that provides data or service for a client, usually by listening for incoming network calls. Note that a single machine or piece of software can be both a client and a server at the same time. For instance, a single machine could act as a server for end users and as a client for a database.
52
High Availability
Used to describe systems that have particularly high levels of availability, typically 5 nines or more; sometimes abbreviated "HA".
53
Rendezvous Hashing
A type of hashing also coined **highest random weight** hashing. Allows for minimal re-distribution of mappings when a server goes down.
54
Disk
Usually refers to either **HDD (hard-disk drive)** or **SSD (solid-state drive)**. Data written to disk will persist through power failures and general machine crashes. Disk is also referred to as **non-volatile storage**. SSD is far faster than HDD (see latencies of accessing data from SSD and HDD) but also far more expensive from a financial point of view. Because of that, HDD will typically be used for data that's rarely accessed or updated, but that's stored for a long time, and SSD will be used for data that's frequently accessed and updated.
55
Server
A machine or process that provides data or service for a client, usually by listening for incoming network calls. Note that a single machine or piece of software can be both a client and a server at the same time. For instance, a single machine could act as a server for end users and as a client for a database.
56
Availability
The odds of a particular server or service being up and running at any point in time, usually measured in percentages. A server that has 99% availability will be operational 99% of the time (this would be described as having two nines of availability).
57
IP
Stands for Internet Protocol. This network protocol outlines how almost all machine-to-machine communications should happen in the world. Other protocols like TCP, UDP and HTTP are built on top of IP.
58
File System
An abstraction over a storage medium that defines how to manage data. While there exist many different types of file systems, most follow a hierarchical structure that consists of directories and files, like the **Unix file system**'s structure.
59
DDoS Attack
Short for "distributed denial-of-service attack", a DDoS attack is a DoS attack in which the traffic flooding the target system comes from many different sources (like thousands of machines), making it much harder to defend against.