YouTube Flashcards

(45 cards)

1
Q

What are the primary functional requirements for designing a system like YouTube?

A
  • Stream videos
  • Upload videos
  • Search videos according to titles
  • Like and dislike videos
  • Add comments to videos
  • View thumbnails

Functional requirements are the features and functionalities that users will experience in the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the key non-functional requirements for YouTube’s design?

A
  • High availability
  • Scalability
  • Good performance
  • Reliability

Non-functional requirements pertain to the system’s performance expectations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: YouTube requires strong consistency in its design.

A

False

The system does not require all users to receive immediate notifications for newly uploaded content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the total number of YouTube users estimated?

A

1.5 billion

This number reflects the total user base of YouTube.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the average length of a video on YouTube?

A

5 minutes

This average length is used for various calculations in resource estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the size of an average 5-minute video before processing?

A

600 MB

This size is the uncompressed format of the video before encoding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the size of an average video after encoding?

A

30 MB

This size results from encoding using different algorithms for different resolutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fill in the blank: The formula for total storage requirement is Total storage = Total upload/min × _______.

A

Storage min

Storage min is the storage required for each minute of content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How much bandwidth is required for streaming videos if 480 Gbps is needed for uploading?

A

144,000 Gbps

This is based on the upload:view ratio of 1:300.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the estimated number of servers needed at peak load for YouTube?

A

Approximately 8K servers

This estimation is based on the number of requests per second and the server’s response capacity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the key building blocks of YouTube’s design?

A
  • Databases
  • Blob storage
  • CDN (Content Delivery Network)
  • Load balancers
  • Servers
  • Encoders and transcoders

These components are crucial for the functionality and performance of the YouTube system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the upload:view ratio assumed for YouTube’s bandwidth estimation?

A

1:300

This ratio indicates that for every uploaded video, there are 300 views.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the first step in YouTube’s high-level design workflow?

A

The user uploads a video to the server

This initiates the process of storing metadata and user data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the encoder do in YouTube’s design?

A

Compresses the video and transforms it into multiple resolutions

Resolutions include 2160p, 1440p, 1080p, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of the CDN in YouTube’s architecture?

A

Acts as a cache to enable low latency video streaming for users

It forwards popular videos for quick access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the primary method used for uploading videos in the API design?

A

POST method

The endpoint is /uploadVideo.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What parameters are required for the /uploadVideo API?

A
  • user_id
  • video_file
  • category_id
  • title
  • description
  • tags
  • default_language
  • privacy_settings

Each parameter serves specific purposes related to the video upload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What method is used for the /streamVideo API?

A

GET method

This method retrieves video streaming data.

20
Q

What parameters are included in the /streamVideo API?

A
  • user_id
  • video_id
  • screen_resolution
  • user_bitrate
  • device_chipset

These parameters help optimize video delivery based on user capabilities.

21
Q

What does the /searchVideo API do?

A

Allows users to search for videos based on specific criteria

It uses the GET method.

22
Q

What optional parameters can be used with the /searchVideo API?

A
  • length
  • quality
  • upload_date

These parameters filter search results.

23
Q

How does the like/dislike API function?

A

Uses the POST method to register a like or dislike

It updates the database based on the input.

24
Q

What is the purpose of the commentVideo API?

A

Allows users to add comments to videos

It requires the comment_text parameter.

25
What is the role of load balancers in YouTube's architecture?
To divide user requests among web servers ## Footnote This ensures efficient handling of user traffic.
26
What type of database is used for storing user account data?
A separate database from video metadata ## Footnote This separation optimizes access times.
27
What is Bigtable used for in YouTube's design?
Storing thumbnails due to its high throughput and scalability ## Footnote It is ideal for key-value data storage.
28
What is the temporary storage used for in YouTube's architecture?
To store user-uploaded videos before encoding ## Footnote This helps manage uploads efficiently.
29
What programming languages might be used on the application servers?
Different languages for various tasks, e.g., C for encryption ## Footnote This allows efficient processing of requests.
30
What is the purpose of sharding in YouTube's storage system?
To effectively manage storage as the system scales ## Footnote It helps with frequent writes on the database.
31
What data is included in the JSON file for each uploaded video?
* Title of the video * Channel name * Description of the video * Content extracted from transcripts * Video length * Categories ## Footnote This data is used for indexing and searching videos.
32
True or False: The processing engine in YouTube's search feature only considers keywords for relevance.
False ## Footnote It also considers view count, watch time, and user history.
33
1. Different Estimations needed for design
Resource estimation Storage estimation Bandwidth estimation Number of servers estimation Building Blocks we will use
34
Resource estimation for youtube
System Resource Estimation Resource Identification Estimation requires the identification of important resources that we'll need in the system. Hundreds of minutes of video content get uploaded to YouTube every minute. Also, a large number of users will be streaming content at the same time, which means that the following resources will be required: Required Resources - Storage resources will be needed to store uploaded and processed content. - A large number of requests can be handled by doing concurrent processing. This means web/application servers should be in place to serve these users. - Both upload and download bandwidth will be required to serve millions of users. Assumptions for Resource Conversion User Statistics - Total number of YouTube users: 1.5 billion - Active daily users (who watch or upload videos): 500 million Video Specifications - Average length of a video: 5 minutes - Size of an average (5 minute-long) video before processing/encoding: 600 MB - Size of an average video after encoding (using different algorithms for different resolutions like MPEG-4 and VP9): 30 MB
35
Storage requirement for youtube
36
Bandwidth Estimation
A lot of data transfer will be performed for streaming and uploading videos to YouTube. This is why we need to calculate our bandwidth estimation too. Assume the upload:view ratio is 1:300—that is, for each uploaded video, we have 300 video views per second. We’ll also have to keep in mind that when a video is uploaded, it is not in compressed format, while viewed videos can be of different qualities. Let’s estimate the bandwidth required for uploading the videos.
37
Number of Server Estimation
38
Building blocks we will use in youtube
Databases are required to store the metadata of videos, thumbnails, comments, and user-related information. Blob storage is important for storing all videos on the platform. A CDN is used to effectively deliver content to end users, reducing delay and burden on end-servers. Load balancers are a necessity to distribute millions of incoming clients requests among the pool of available servers. Other than our building blocks, we anticipate the use of the following components in our high-level design: Servers are a basic requirement to run application logic and entertain user requests. Encoders and transcoders compress videos and transform them into different formats and qualities to support varying numbers of devices according to their screen resolution and bandwidth.
39
High Level Design
The workflow for the abstract design is provided below: * The user uploads a video to the server. * The server stores the metadata and the accompanying user data to the database and, at the same time, hands over the video to the encoder for encoding (see 2.1 and 2.2 in the illustration above). * The encoder, along with the transcoder, compresses the video and transforms it into multiple resolutions (like 2160p, 1440p, 1080p, and so on). The videos are stored on blob storage (similar to GFS or S3). * Some popular videos may be forwarded to the CDN, which acts as a cache. * The CDN, because of its vicinity to the user, lets the user stream the video with low latency. However, CDN is not the only infrastructure for serving videos to the end user, which we will see in the detailed design.
40
Why don’t we upload the video directly to the encoder instead of to the server? Doesn’t the current strategy introduce an additional delay?
There are several reasons why it’s a good idea to introduce a server in between the encoder and the client: * The client could be malicious and could abuse the encoder. * If the uploaded video is a duplicate, the server could filter it out. * Encoders will be available on a private IP address within YouTube’s network and not available for public access.
41
API DEsign
The POST method can upload a video to the /uploadVideo API: The GET method is best suited for the /streamVideo API: The /searchVideo API uses the GET method: GET method to access the /viewThumbnails API: like and dislike API uses the POST method to register a like/dislike Much like the like and dislike API, we only have to provide the comment string to the API. This API will also use the POST method. ``` uploadVideo(user_id, video_file, category_id, title, description, tags, default_language, privacy_settings) streamVideo(user_id, video_id, screen_resolution, user_bitrate, device_chipset) searchVideo(user_id, search_string, length, quality, upload_date) viewThumbnails(user_id, video_id) likeDislike(user_id, video_id, like) commentVideo(user_id, video_id, comment_text) ```
42
Storage schema
43
Detailed Design
Detailed design components Since we highlighted the requirements of smooth streaming, server-level details, and thumbnail features, the following design will meet our expectations. Let’s explain the purpose of each added component here: Load balancers: To divide a large number of user requests among the web servers, we require load balancers. Web servers: Web servers take in user requests and respond to them. These can be considered the interface to our API servers that entertain user requests. Application server: The application and business logic resides in application servers. They prepare the data needed by the web servers to handle the end users’ queries. User and metadata storage: Since we have a large number of users and videos, the storage required to hold the metadata of videos and the content related to users must be stored in different storage clusters. This is because a large amount of not-so-related data should be decoupled for scalability purposes. Bigtable: For each video, we’ll require multiple thumbnails. Bigtable is a good choice for storing thumbnails because of its high throughput and scalability for storing key-value data. Bigtable is optimal for storing a large number of data items each below 10 MB. Therefore, it is the ideal choice for YouTube’s thumbnails. Upload storage: The upload storage is temporary storage that can store user-uploaded videos. Encoders: Each uploaded video requires compression and transcoding into various formats. Thumbnail generation service is also obtained from the encoders. CDN and colocation sites: CDNs and colocation sites store popular and moderately popular content that is closer to the user for easy access. Colocation centers are used where it’s not possible to invest in a data center facility due to business reasons.
44
DEsign flow and technology use
Now that we understand the purpose of every component, let’s discuss the flow and technology used in different components in the following steps: The user can upload a video by connecting to the web servers. The web server can run Apache or Lighttpd. Lighttpd is preferable because it can serve static pages and videos due to its fast speed. Requests from the web servers are passed onto application servers that can contact various data stores to read or write user, videos, or videos’ metadata. There are separate web and application servers because we want to decouple clients’ services from the application and business logic. Different programming languages can be used on this layer to perform different tasks efficiently. For example, the C programming language can be used for encryption. Moreover, this gives us an additional layer of caching, where the most requested objects are stored on the application server while the most frequently requested pages will be stored on the web servers. Multiple storage units are used. Let’s go through each of these: Upload storage is used to store user-uploaded videos temporarily before they are encoded. User account data is stored in a separate database, whereas videos metadata is stored separately. The idea is to separate the more frequently and less frequently accessed storage clusters from each other for optimal access time. We can use MySQL if there are a limited number of concurrent reads and writes. However, as the number of users—and therefore the number of concurrent reads and writes—grows, we can move towards NoSQL types of data management systems. Since Bigtable is based on Google File System (GFS), it is designed to store a large number of small files with low retrieval latency. It is a reasonable choice for storing thumbnails. The encoders generate thumbnails and also store additional metadata related to videos in the metadata database. It will also provide popular and moderately popular content to CDNs and colocation servers, respectively. The user can finally stream videos from any available site.
45
Yooutube search
Since YouTube is one of the most visited websites, a large number of users will be using the search feature. Even though we have covered a building block on distributed search, we’ll provide a basic overview of how search inside the YouTube system will work. Each new video uploaded to YouTube will be processed for data extraction. We can use a JSON file to store extracted data, which includes the following: Title of the video. Channel name. Description of the video. The content of the video, possibly extracted from the transcripts. Video length. Categories. Each of the JSON files can be referred to as a document. Next, keywords will be extracted from the documents and stored in a key-value store. The key in the key-value store will hold all the keywords searched by the users, while the value in the key-value store will contain the occurrence of each key, its frequency, and the location of the occurrence in the different documents. When a user searches for a keyword, the videos with the most relevant keywords will be returned. An abstraction of how YouTube search works An abstraction of how YouTube search works The approach above is simplistic, and the relevance of keywords is not the only factor affecting search in YouTube. In reality, a number of other factors will matter. The processing engine will improve the search results by filtering and ranking videos. It will make use of other factors like view count, the watch time of videos, and the context, along with the history of the user, to improve search results.