Framework Flashcards

(82 cards)

1
Q

What are the main sections of the delivery framework

A
  1. Requirements
  2. Core Entities
  3. API or Interface
  4. Data Flow
  5. High-level Design
  6. Deep Dives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the time limit and goal of the requirements section

A

Time limit: 5 mins
Goal: Gain a clear understanding of the system by breaking requirements into function and non-functional requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are functional requirements

A
  • Core features of the system being designed
  • “Users/Clients should be able to…” statements.
  • Requirements should be targeted
  • Prioritize on top 3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are non-functional requirement

A
  • System qualities important to users
  • “The system should be able to…” or “The system should be..” statements
  • Should be in context of the system and quantified where possible, e.g. “The system should have a low latency search, <500ms” instead of “The system should be low latency”
  • Prioritize top 3-5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are things to consider when creating non-functional requirements

A
  1. CAP Theorem: prioritize consistency or availability
  2. Environment Constraints: Web, mobile, etc.
  3. Scalability: Bursty traffic at certain times of days, read write ratio
  4. Latency: how quickly does the system need to respond to user requests
  5. Durability: how important is it to not lose data
  6. Security: Data protection, regulations
  7. Fault Tolerance: How does the system handle failures
  8. Compliance: Any legal or regulatory requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the time limit and goal of the core entities section

A

Time limit: 2 minutes
Goal: a bulleted list of the entities in the system

  • Who are the core actors in the system?
  • What are the nouns or resources necessary to satisfy the functional requirements
  • Use good names for entities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the time limit and goal of the API or system interface section

A

Time limit: 5 minutes
Goal: Define the contract between the system and it’s users

  • REST, GraphQL, or Wire Protocol (Generally use REST unless you are concerned with over-fetching
  • Generate a list of endpoints and what they would return
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the time limit and goal of the data flow section

A

Time limit: 5 minutes
Goal: Describe the high level sequence of actions or processes that the system performs on the inputs to produce the desired outputs

  • The data flow output will be a simple list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the time limit and goal of the high level design section

A

Time limit: 10-15 minutes
Goal: A drawing of components and how they interact

  • Ensure the architecture satisfies the design
  • You may be able to go through your API one-by-one and build up your design
  • While drawing, talk through the process and how the data flow
  • Document relevant column/fields in the DB
  • Stay focused, this is only the high level design, complexity can be added later
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the time limit and goal of the deep dive section

A

Time limit: 10 minutes
Goal: harden the design

  • Ensure the design meets all of the non-functional requirements
  • Address edge cases
  • Identify and address issues and bottlenecks
  • Improve the design based on questions from the interviewer
  • A senior candidate should identify the above cases and lead the discussion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a core database and what are choices for a core database

A
  • A core database is the data storage for your product
  • Choices are: Relational (SQL), NoSQL, Blob
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a relational database (RBDMS) and when should you use it

A
  • Relational databases store relations and are good at storing transactions
  • This is the default choices for a product design interview
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a NoSQL database and when should you use it

A
  • NoSQL databases are a broad category of databases that are often schma-less
  • Common data models are:
    • key-value
    • document
    • column-family
    • graph
  • Great candidates for
    • Flexible data models
    • Scalability
    • Handling big Data and real-time web apps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a blob storage and when should you use it

A
  • A blob storage is used to store large unstructured blobs of data, e.g. video, images, etc.
  • You should avoid using a blob storage as your primary database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a search optimized database and when should you use it

A
  • You should use a search optimized database when you need full-text search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the context of a search optimized database, what is an inverted index

A

An inverted index is a data structure that maps words to documents. This allows you to quickly find the documents that contain the words you are searching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In the context of a search optimized database, what is tokenization

A

Tokenization is the process of breaking a piece of text into individual words. This allows the mapping of words to an inverted index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In the context of a search optimized database, what is stemming

A

Stemming is the process of reducing words to their root form. For exampling, “running” and “runs” would both be reduced to “run”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In the context of a search optimized database, what is fuzzy search

A

Fuzzy search is the ability to find words similar to a given search term. This can be done with algorithms like edit distance to find words that might be mispelled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In the context of a search optimized database, what is scaling

A

Search optimized databases can be scaled horizontally by adding more nodes to a cluster and sharding across those nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are some examples of search optimized databases

A
  • Elastic Search
  • Postgres with a GIN index
  • Redis full text search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In the context of a blob storage, what is durability

A

Durability relates to the chance of data loss during a failure. Blob storages are quite durable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In the context of a blob storage, what is scalability

A

Blob storages can be considered infinitely scalable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In the context of a blob storage, what is cost

A

Blob storages are cheap, generally an order of magnitude cheaper than NoSQL solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
In the context of a **blob storage**, what is **security**
Blob storages have built-in security features like: encryption at rest and in transit, and access control
26
In the context of a **blob storage**, what is **uploading and downloading from the client**
Blob storage services allow you to upload and download directly from the client. They generally utilize presigned URLs
27
In the context of a **blob storage**, what is **chunking**
Because large files are uploaded/downloaded to blob storage, chunking allows uploading and downloading in parallel
28
In the context of a **relational database**, what are **joins**
SQL joins are a way to join data from multiple different tables. Joining can be a performance bottleneck so it is advisable to minimize them
29
In the context of a **relational database**, what are **indexes**
Indexes are a way of storing data to make it faster to query. Indexes are often implemented using a b-three or a hash table
30
In the context of a **relational database**, what is a **transaction**
Transactions are a way of grouping multiple operations together
31
What are the **ACID** properties of a **relational database**
Atomicity: The entire transaction takes place at once or doesn't happen at all Consistency: The database must be consistent before and after the transaction Isolation: Multiple transactions occur independently without interference Durability: The changes of a successful transaction occurs even if the a system failure occurs
32
What is an **API gateway** and when should you use it
- An API gateway sits in front of your services and routes requests to the appropriate backend services, especially in microservice architectures - Gateways should be included in almost all product designs as the first point of contact for your clients - Gateways are typically responsible for things like authentication, rate limiting, and logging
33
What is a **load balancer** and when should you use it
A load balancer is useful in times of heavy traffic. It allows horizontal scaling by routing traffic to different machines to avoid overloading a single machine
34
When should you choose a L4 load balancer or a L7 load balancer
Choose a L4 load balancer when you are doing real-time updates with websockets. Otherwise, choose a L7 load balancer
35
What is a **message queue** and when should you use it
- A queue's function is to smooth load across a system. - A queue should be used to: - buffer for bursty traffic - distribute work across a system Be careful not to introduce a queue into a synchronous work load as it will break latency requirements
36
In the context of a **message queue**, what is **message ordering**
It is the way in which messages are ordered in the queue. The most popular is FIFO
37
In the context of a **message queue**, what is a **retry mechanism**
A retry mechanism is a queues ability to redeliver a message a certain number of times before it's considered a failure
38
In the context of a **message queue**, what is a **dead letter queue**
A dead letter queue is a queue used to store messages that cannot be processed. They are useful for debugging and auditing
39
In the context of a **message queue**, what is **scaling with partitions**
Queues can be partitioned across multiple machines, so increasing the number of machines in a partition can scale the queue
40
In the context of a **message queue**, what is **backpressure**
Backpressure is a means of slowing down requests to make sure your system is not overwhelmed
41
What are **streams/event sourcing** and when should you use them
- Streams are continuous data flows that are stored and processed for a configurable period of time - Even sourcing is a technique where application state can be stored as a sequence of events allowing the application state to be reconstructed at any point of time Common use cases are: - You need to process large amounts of data in real time - You need to support complex processing scenarios like event sourcing - When you need to support multiple consumer reading from the same stream
42
In the context of a **stream**, what is **scaling with partitions**
Partitions can be used to scale streams across multiple servers. Partition keys need to be specified to ensure related events are stored on the same partition
43
In the context of a **stream**, what are **multiple consumer groups**
A stream can be read by multiple different consumers. One consumer might read a stream to populate a dashboard, while another consumer might populate a database for historical analysis
44
In the context of a **stream**, what is **replication**
Streams can replicate data on multiple servers to ensure that the service is fault tolerant
45
In the context of a **stream**, what is **windowing**
Windowing is a way to group events together based on time or count. This is great for aggregate analytics over a certain time, e.g. 15 mins, 1 hr, etc.
46
What are some common streaming technologies
- Kafka - Flink - Kinesis - Spark Streaming
47
What is a **distributed lock** and when should you use them
A distributed lock is a way of locking something across multiple systems or processes for a reasonable amount of time. A distributed lock is generally implemented using a distributed key-value store Common use cases are: - E-commerce checkout system - Ride-Sharing matchmaking - Distributed Cron job - Online auction bidding system
48
In the context of a **distributed lock**, what are **locking mechanisms**
A locking mechanism is how the lock is implemented. This is typically done using a key-value store. One specific example is Redis using Redlock
49
In the context of a **distributed lock**, what is **lock expiry**
A lock expiry is an expiration date on a lock. This is important to make sure a lock doesn't get stuck in a lock state if process dies or hangs
50
In the context of a **distributed lock**, what is **locking granularity**
A lock can be used to lock a single resource or a group of resources
51
In the context of a **distributed lock**, what are **deadlocks**
This occurs when two processes are waiting on each other to release a lock. One process has a lock A and need to lock B. A second process has a lock on B and need to lock A. Both are waiting for each other to release their current lock.
52
What are some common distributed locking systems
- Redis - Zookeeper
53
What are common ways to prevent a deadlock
1. Utilize resource ordering to avoid deadlocks - Ensure all processes acquire resources in a predefined global order 2. Use timeouts - If a process cannot acquire a resource in a reasonable amount of time it aborts it's operation 3. Employ a try-lock mechanism - Use a non-block method that attempts to lock the resource and can try again later if the resource is currently locked
54
What is a **distributed cache** and when should you use it
A distributed cache is a server or cluster of servers that store frequently used data to help lower latency Common use cases are: - Save aggregated metrics - Reduce the number of DB queries - Speed up expensive queries
55
In the context of a **distributed cache**, what is an **eviction policy**
An eviction policy is a means of removing items form the cache Example eviction policies are: - LRU (Least Recently Used) Evicts the oldest item - LFU (Least Frequently Used) Evicts the items accessed the least - FIFO (First in First out) A queue based eviction
56
In the context of a **distributed cache**, what is a **cache invalidation strategy**
A cache invalidation strategy is a means to ensure that data being stored in cache is accurate and up to date
57
In the context of a **distributed cache**, what is a **cache write strategy**
A cache write strategy is a process in which data is written to the cache Example strategies are: - Write-Through Cache: write to cache and underlying data store simultaneously - Write-Around Cache: write to the data store and not to cache - Write-Back Cache: writes the data to the cache then asynchronously writes to the data store
58
What are some common cache systems
- Redis - Memcached
59
What is a **CDN** and when should you use it
A CDN is a Content Delivery Network is a cache that uses distributed servers to deliver content based on a users geographic region Common use cases: - Static assets: images, videos, javascript files - Dynamic content that is accessed frequently, but changes infrequently: e.g. a daily blog post - Cache API responses to reduce latency - Social media might store profile pictures in a CDN to serve to all users globally
60
What are some common CDNs
- Cloudflare - Akamai - CloudFront
61
What is the CAP theorem
You can only have 2 out of the 3: 1. Consistency: all nodes/users see the same data at the same time 2. Availability: every request gets a response (successful or not) 3. Partition tolerance: system works despite network failures between nodes
62
What is a **strongly consistent** system
Once data is written to a system, all subsequent reads will reflect the write
63
What is a **weakly consistent** or **eventually consistent** system
Once data is written to a system, subsequent reads might read the old data. Eventually, the new data will be read
64
What is Change Data Capture (CDC)
Change Data Capture is a process where changes (inserts, updates, deletes) are logged in a relational format. The results of CDC can be used for auditing or to update other systems, such as updating the index on Elastic Search
65
What are the 4 main **communication protocols**
1. HTTP(S) 2. Server Side Events (SSE) 3. Long Polling 4. Websockets
66
Describe the HTTP(S) **communication protocol**
HTTP(S) protocol is simply a REST or request/response interface. Each request is stateless, so the API can scale horizontally
67
Describe the long polling **communication protocol**
Long polling is a blend of the HTTP(S) and websockets. The client will send a request and the server will hold on to the request until an update is available. Once the request is fulfilled, the client will submit another request.
68
Describe the Server Side Events (SSE) **communication protocol**
The Server Side Events (SSE) protocol is best for unidirectional communication from the server to the client. The client can make one request and the server can send new data whenever available. This is achieved through a long-lived HTTP connection
69
Describe the websockets **communication protocol**
Websockets are best if you need realtime, bidirectional communication between the client and server. Since the client needs to maintain an active connection with the server, this can be troublesome for load balancers. One way to implement websockets is to use a message broker between the client and server. This ensures you don't need long lived connections to every service in your backend
70
In the context of **security** what is **authentication/authorization**
- Authentication: Is a user allowed on the system - Authorization: Is the user allowed to view a specific resource - API Gateways generally handle auth - Auth0 is also a good service to handle auth
71
In the context of **security** what is **encryption**
- Data in transit can be handled by protocol encryption (HTTPS SSL/TLS) - Data at rest can be handled by storage encryption - For sensitive data it may be best to sign the data with a key that only the user has so that no one else can view the data. This is known as End to End (E2E) encryption
72
In the context of **security** what is **data protection**
Data protection is the process of ensuring data is protected from unauthorized access, use, or disclosure. Using a rate limiter, or throttler is a good idea to hinder data being scraped
73
What are the 3 levels of monitoring
1. Infrastructure monitoring 2. Service-level monitoring 3. Application-level monitoring
74
In the context of **monitoring** what is **infrastructure monitoring**
Infrastructure monitoring is monitoring the health and performance of your infrastructure: CPU usage, memory usage, disk usage, and network usage. Tools like Data Dog and New Relic are useful
75
In the context of **monitoring** what is **service-level monitoring**
Service-level monitoring is the health and performance of your services: request latency, error rates, and throughput.
76
In the context of **monitoring** what is **application-level monitoring**
Application-level monitoring is the health and performance of your application: the number of users, the number of active sessions, and the number of active connections. This could be key business metrics. Useful tools are Google Analytics and Mixpanel
77
Describe the pattern **Simple DB-backed CRUD service with caching**
- Most common for web based applications - Load balancer to distribute traffic across multiple instances of your service
78
Describe the pattern **async job worker pool**
- For systems that needs to process a lot of data and can tolerate a delay - Queue options: SQS, Kafka - Worker options: lambda, EC2 instances
79
Describe the pattern **two stage architecture**
- A two stage architecture is good for scaling an algorithm with poor performance - In the first stage, we use a fast algorithm to filter out the vast majority of dissimilar items - In the second stage, we use a slower algorithm that is more precise - The arch is common in: - Recommendation systems (candidate generators) - Search Engines (inverted indexes) - Route planning (ETA services)
80
Describe the pattern **event-driven architecture**
- Event-Driven Architecture (EDA) is useful in systems where it's crucial to react to changes in real-time - Core components are: event producer, event routers (brokers), and event consumers - Event router options: Kafka, AWS Event Bridge
81
Describe the pattern **durable job processing**
- Durable job processing is a system that has jobs that might take hours or days to complete - The common pattern is to use a checkpointing system to periodically save a workers progress - Common distribute durable logs: Kafka, Uber's Cadence, Temporal
82
Describe the pattern **proximity-based services**
- Proximity based services require you to search for entities by location - Geospatial indexes are key to querying and retrieving entities based on proximity - Common geospatial solutions: Postgres PostGIS, Redis Geospatial data type, Elasticsearch with geo-queries - The arch typically involves dividing the geographical area into manageable regions, thus reducing your search space - Geospatial indexes are only necessary when you need to index hundreds of thousands or millions of items. Otherwise, it's better to just scan all of the items