Test 2 Key Concepts Flashcards Preview

cs6250 > Test 2 Key Concepts > Flashcards

Flashcards in Test 2 Key Concepts Deck (99)
Loading flashcards...
1

Congestion Control

TCP uses a congestion window in the sender side to do congestion avoidance. The congestion window indicates the maximum amount of data that can be sent out on a connection without being acknowledged. TCP detects congestion when it fails to receive an acknowledgement for a packet within the estimated timeout.

2

Congestion Control Goals

Use network resources efficiently

Preserve fair allocation of resources

Avoid congestion collapse

3

Congestion Control Approaches

End-to-End congestion control

Network assisted congestion control

4

Fairness

Fairness is how bandwidth allocated between the different flows. Two common definitions of fair are that all flows get equal throughput, or that all flows get throughput proportionate to their demand
(i.e., how much they want to send).

5

Efficiency

Efficiency is how much of the available bandwidth is used, i.e., efficient congestion control
leaves little or no bandwidth wasted. (Some definitions of efficiency may refer specifically to
bandwidth used to do "productive work", thus excluding overhead traffic.)

6

Additive Increase

Additive increase will increase the throughput until it equals the bandwidth, at which point a
packet loss will occur and trigger multiplicative decrease. At that point, throughput immediately
drops to ½ the bandwidth. Additive increase then resumes, raising throughput linearly until it
reaches the total bandwidth again. Thus the average throughput is the average of ½ bandwidth
and 1x bandwidth = ¾ bandwidth.

7

TCP AIMD

Additive Increase Multiplicative Decrease (AIMD)

Graph of rate over time, TCP sawtooth because TCP increase rate using additive rate until it reaches the saturation point, it'll see packet loss and decrease sending rate by half

Number of packets sent per packet loss is the area of the triangle of a sawtooth.

Loss rate is Wm^2/8

Throughput = 3/4*Wm/RTT

Throughput is inversely proportional to RTT and square root of the loss rate.

8

TCP incast

The incast problem occurs when:
1. collective communication (i.e., many-to-one or many-to-many
patterns) occurs on high fan-in switches.

2. This results in many small packets arrive at the switch at the same time, thus causing some of the packets to be lost.

3.The last necessary factor is a low-latency network, which means the timeout delay will be much more than the round-trip-time of the network.

Consequently, large delays occur in which the system is simply waiting for the timeouts to occur. This slows the whole application, since hearing from all the
senders in collective communication is usually necessary before the application can proceed.

9

TCP fit for streaming applications

Audio/video can tolerate Loss and delay but not variability in delay

So TCP not a good fit for congestion control in streaming audio or streaming video.

TCP retransmits lost packets, but not always useful

Slows down rate after packet loss

Protocol overhead (TCP header of 20 bytes and ack for every packet isn't needed)

10

Network Assisted Congestion Control

Routers provide explicit feedback about the rates that end systems should be sending.

Set single bit indicating congestion (TCP ECN or explicit congestion notifications)

11

Solution to TCP Incast

Barrier Synchronization

Client/application has many parallel requests and can't progress without responses to all of them. Addition of more servers reduces overflow of switch buffer, causing severe packet loss and inducing throughput collapse.

Solution: use fine grained TCP timeouts (microseconds) to reduce that wait time.

Could also reduce network load by having client only acknowledge every other packet.

12

Media Streaming Challenges

Large volume of data

Data volume varies over time

Low tolerance for delay variation

Low tolerance for delay, period (but some loss acceptable)

13

UDP (User Datagram Protocol)

UDP best for streaming video and streaming audio

No automatic retransmission

No sending rate adaptation
Smaller header

14

Leaky bucket

The leaky bucket takes data and collects it up to a maximum capacity. Data in the bucket is only released from the bucket at a set rate and size of packet. When the bucket runs out of data, the leaking stops. If incoming data would overfill the bucket, then the packet is considered to be non-conformant and is not added to the bucket. Data is added to the bucket as space becomes available for conforming packets.

1. Smooths out traffic by passing packets only when there is a token. Does not permit burstiness.

2. Discards packets for which no tokens are available (no concept of queue)

3. Application: Traffic shaping or traffic policing.

Traffic arrives in a bucket of size beta and drains from bucket at a rate rho

Rho controls average rate. Data can arrive faster or slower but cannot drain at a rate faster than rho

So max average rate that traffic can be sent is smooth rate rho

Size of bucket controls max burst size. Even though average cannot exceed rho, but at times sender can exceed rate if total size of burst does not overflow the bucket

Leaky bucket allows flows to periodically burst and regulator ensures average rate does not exceed the drain rate of the bucket

15

Token Bucket

Token Bucket Traffic Shaper is for shaping bursty traffic patterns but still ensure flow does not exceed some average rate

1. Token bucket smooths traffic too but permits burstiness - which is equivalent to the number of tokens accumulated in the bucket.

2. Discards tokens when bucket is full, but never discards packets (infinite queue).

3. Application: Network traffic shaping or rate limiting.

Rho is the rate
of tokens being added to the bucket, so it should match the average bit rate

Beta determines how large and how long a burst is allowed.

Token arrive in a bucket at a rate Rho, and Beta is again the capacity of the bucket. Traffic arrives at an average rate Lambda average and a peak rate Lambda peak. Traffic can be sent by the regulator as long as there are tokens in the bucket.

Different from leaky bucket: if token bucket is full, packet is sent and b tokens removed. But if bucket empty, must wait until b tokens arrive. If bucket partially full, will send if at least little b tokens. Otherwise wait.

Limitation: any traffic interval of length T, the flow can send Beta + T*Rho tokens of data. If network tries to police the flows by measuring traffic over intervals of length T, flow can cheat by sending this amount of data in each interval. Over 2T, flow consumes 2 (Beta + T*Rho), which is greater than the Beta +2T*Rho it's supposed to consume.

16

Difference in Token Bucket and Leaky Bucket

Token Bucket
--permits burstiness, but bounds it.
in any interval T, rate < Beta (max tokens that can be accumulated in bucket) + T*Rho rate tokens accumulate
long term rate always less than rho
--No discard or priority

Leaky Bucket
--smooths bursty traffic
--priority policies

both easy to implement, but token bucket is more flexible since additional parameters to configure burst size

17

Power Boost

Power Boost Allows subscriber to send at higher rate for a brief time

Targets spare capacity in network for use by subscribers who do not put sustained load on network.

two types:
-Capped: rate at which user can achieve during burst window is set to not exceed a particular rate. To cap, apply second token bucket with another value of Rho to limit peak sending rate for power boost eligible packets to Rho C.
--Uncapped: configuration simple. Area above average rate and below power boost rate is power boost bucket rate. Max sustained traffic rate is Rho.

18

Power boost: How long can sender send at the rate r that exceeds the sustained rate?

sending rate r>Rsustained
Powerboost bucket size Beta

Beta = d(r*Rsus)

d = Beta/(r-Rsus)

19

Power boost effect on latency

Even though power boost allows users to send at higher rate, users still experience high latency and loss over duration sending at higher rate

Reason: access link can't support the higher rate, resulting in buffers filling up, introducing delays because no packet loss even though access link may not be able to send at that higher rate

Solution: sender shape rate never to exceed sustained rate

20

Buffer Bloat

Buffer Bloat

If buffer can support higher rate, it'll fill with packets, but still only drain at sustained rate.

Even though sender can send at higher rate for brief period of time, packets are queued in a buffer, so see higher delays than if arrived at front of queue and delivered immediately

delay = amount of data in buffer/rate that buffer can drain

Ruins performance for voice, video

Shows up in home routers, home APs, hosts, switches/routers

Sender will send at increasingly faster rates until they see a loss, but buffer will continue to fill up because drains slower, but won't show packet loss

Solution:
--smaller buffers, but this is tall order
--shape traffic such that the rate of traffic coming into the access link never exceeds the uplink that the ISP has provided, then buffer will never fill. Shape traffic at home router to prevent exceeding rate of uplink.

21

Buffer Bloat Solution

Buffer Bloat Solution

--smaller buffers, but this is tall order
--shape traffic such that the rate of traffic coming into the access link never exceeds the uplink that the ISP has provided, then buffer will never fill. Shape traffic at home router to prevent exceeding rate of uplink.

22

Buffer Bloat Example

Example of upload:

Y axis RTT (latency) to nearby server vs time on x axis

Modems experience huge increase of latency upon upload. Modem itself has buffer, ISP upstream of that buffer and access link draining buffer at certain rate.

TCP senders in the home will send until they see lost packets, but if buffer is large, senders won't see lost packets until buffer is full.

Senders continue to send at increasingly faster rates until they see a loss.

As a result, packets that are arriving in buffer see increasing delays and senders continue sending at faster rates because without seeing a loss, there is no signal to slow down.

23

Hypertext Transfer Protocol (HTTP)

Hypertext Transfer Protocol (HTTP)
-application layer protocol to transfer web content
-protocol browser uses to request webpages
-protocol to return objects to browser
-layered on top of byte stream protocol like TP

24

HTTP Requests

Request line
-indicates method of request (GET, POST sends data to server, HEAD returns headers of get request)
-includes URL
-includes version number

Optional headers
-referrer: what caused page to be requested
-user agent: client software

25

HTTP Response

Status line
-HTTP version
-response code indicating outcomes (100s information, 200s success, 300s redirect, 400s errors, 500s server errors)
-Server
-Location
-Allow
-Content encoding
-Content length
-Expires
-Modified

26

HTTP request HEAD method

The HEAD method in HTTP requests a document just like the GET method except that the server will respond to a HEAD request with only the HTTP response header; the response body
(which would normally contain the document data) is not included.

This saves the delay of transmitting the actual document, e.g., if it is a large file, but allows the browser to check the
Last-Modified field in the response header to find out if it's been changed since the time when
the cached version was retrieved.

27

HTTP Response code 200s

Success

e.g. status code…
200 OK:

The operation in the request message succeeded. What that operation is exactly
depends on the request method. For example, if the request method was GET then 200
OK means that the document was retrieved and its content should be in the body of the
200 OK response. (200 OK responses to other methods do not necessarily contain a
body, though. This also depends on what the method was.)

28

HTTP Response code 100s

information

29

HTTP Response code 300s

Redirect

e.g. 302 Moved Temporarily (also sometimes called 302 Found)

The requested file is not at this location (i.e., the path part of the GET method line), but
the browser should instead use the URL provided in the Location field of the response to
retrieve the file.

However, the file may be found at this location in the future (unlike a
Moved Permanently response), so the URL in the Location field should be used this
once, but not necessarily again in the future.

30

HTTP Response code 400s

Errors

e.g. 404 Not Found
The requested file does not exist on the server. That is, the file indicated by the path part of the GET method line cannot be found at that path.

31

HTTP Response code 500s

server errors

32

Early HTTP

v.0.9/1.9

One request/response per TCP Connection

Advantage: simple to implement

Disadvantage: TCP connection for every request -overhead, slowing transfer,
-TCP three way handshake for every request
-TCP must start in slow start every time connection opens
-servers have many connections that are forced to keep TCP connections in time-wait states until timers expire, so reserve additional resources even after connections completed

Solution to increase efficiency: persistent connections

33

Persistent Connections

Multiple HTTP requests/responses are multiplexed on a single TCP connection
--Delimiters indicate end of requests
-Content length allows receive to identify length of response
----So server must know size of transfer in advance

34

Web Content Distribution Networks (CDN) defined

Overlay network of web caches designed to deliver content to a client from optimal location
-Optimal can be geographically closest or something else.

CDNs often have geographically disparate servers

Purpose to place caches as close to users as possible

Owned by content providers (google), networks (At&t) or independently (Akami)

Nonnetwork CDNs typically place servers in other autonomous systems or ISPs

The number of cache nodes vary.

35

CDNs challenges: Server Selection

Which server criteria
-least loaded server
-lowest latency (most typical)
-any alive server

36

CDNs challenges: Content Routing

How to direct clients
1-Routing systems (e.g. anycast): number all replicas with same IP address then allow routing to take them to closes replica
--simple but servers given very little control since at whims of Internet routing (simple but coarse)

2-Application-based (e.g. HTTP redirect): requires client to first go to origin server to get redirect, increasing latency (simple but delays)

3-Naming system (e.g. DNS): client looks up domain and response contains IP address of nearby cache. Significant flexibility in directing different clients to different server replicas (fine-grained control and fast)

37

CDNs and ISPs

CDNs and ISPs have symbiotic relationship when it comes to peering relationships

CDNs like to peer with ISP because peering directly with ISP where customer located provides
-better throughput since no intermediate AS hops and network latency lower
-redundancy: more vectors to deliver content increases reliability
-burstiness: during large request events, having connectivity to multiple networks where content is hosted allows ISP to spread traffic across multiple transit links thereby reducing 95th percentile and lowering transit costs

ISPs like peering with CDNs (or host caches locally) because:
-providing content closer to ISP customers allows ISP to provide good performance for customers
-lower transit costs because traffic isn't traversing costly links

38

BitTorrent

Peer-to-Peer CDN used for file sharing and distribution of large files

Fetch content from peers rather than origin
-reduce congestion and -
-prevent overload at network where content hosted

Break original file into many pieces, replicate different pieces on different peers as soon as possible so each peer assembles file by picking up different pieces and get remaining pieces from other peers. Eventually assemble entire files

39

BitTorrent Publishing

1. Peer creates torrent
-tracker metadata
-pieces of file

2. Seeders create initial copy (has complete copy)
-tracker has metadata about file including list of seeders that contain initial copy of file

3. client starts to download pieces of file from seeder.
-hopefully different from other peers

4. Clients swap chunks of the file and after enough swapping, they should all have complete files.

Leechers: clients with incomplete copies of the file.

Trackers: allows peers to find each other and returns random list of peers leechers can use to swap parts of the file

Freeloading: client leaves network as soon as finished downloading file. Solved by bit torrent

40

BitTorrent: Solution to freeriding

BitTorrent solved freeriding (clients leaving network as soon as finished downloading file) with choking

Choking (tit-for-tat) temporary refusal to upload chunks to another peer
-if peer can't download from peer, don't upload to it
(eliminated freerider problem)

41

BitTorrent getting chunks to swap

If all client receive same chunks, no one has complet copy and clients won't swap

Rarest piece first: client determines which pieces are rarest and download those first
-so most common pieces left to the end and larger variety of pieces downloaded from seeder
-to begin, randomly download because rare piece not a good idea initially
-redundant requests deleted when piece received so single peer with slow transfer rate doesn't slow down the network

42

How does BitTorrent implement the tit-for-tat algorithm? Be sure to explain in detail,
including the roles of both choking and unchoking.

A BitTorrent client sends data only to the top N peers who are sending to it, plus one peer who
is optimistically unchoked. Let's say for example purposes that N=4. Your BitTorrent client will
choose the 4 peers who are sending to it at the fastest rate and it will send data to them in
return.

It will not send to other peers, and they are said to be choked.

Thus it provides
tit-for-tat by sending to those who send the most to it, and choking those that are not sending
to it, or are sending slowly.


However, this creates a problem where two peers who might be able to send to each other are
mutually choked. Neither will begin sending to the other because the other is not sending to it.
Therefore, each client will optimistically unchoke one peer at any given time for a brief period.

If the client sends fast enough to the optimistically unchoked client to get on its top-4 then the
peer will send data back in return. If the client receives enough data from the peer for it to be in
the top-4 then that peer becomes one of the new top-4 and the slowest of the previous top-4
will be choked.

Thus they both end up in each other’s top-4. (The peer is no longer
"optimistically" unchoked, and is merely unchoked.

A new peer is selected to be optimistically
unchoked.)

On the other hand, if the client does not get into its peer's top-4, or if it does but the peer does
not send fast enough in return to get in the client's top-4, then they will not end up in each
other’s top-4. After some time, the client will stop optimistically unchoking that peer and stop
sending to it. It will choose a new peer to optimistically unchoke.


This process repeats forever (until the client has the entire file, that is) in order to keep
exploring different peers for better matches than the client's current top-N.

The game theoretic
result is that clients will end up sending to peers that are able to send back about the same
amount – fast peers will get paired up, while slow peers are matched with each other. This happens because a fast peer will readily drop a slow peer from its top-N in favor of another fast
peer, matching fast peers together. Slow peers will not get matched with fast peers because the
fast peers will soon learn to choke them, but they will pair up with other slow peers because
neither peer can find a better match who is willing to unchoke them.

43

Of the various methods to redirect web requests to CDN servers, DNS redirection is the
most common. Why would this method be more popular than the alternatives?

DNS-based redirection is much faster than HTTP redirection, as the latter requires a couple
extra round trips to servers. (It's actually more than just one extra round trip because you need
to establish a TCP connection to a second different server.) It also gives the CDN provider more
control over who will be redirected where than a technique like IP anycast would. Finally, it is
not too difficult to implement (even if slightly more complex than the other two) and it uses
tools that are widely supported (i.e., DNS) and do not need any modifications to support this technique (i.e., DNS works out of-the-box).

44

DHT

Enable a form of content overlay called a structured overlay

Chord protocol implements DHT
Scalable, distributed lookup service (maps keys to values)

Scalable
Provable correctness
Reasonably good performance

Main motivation is scalable location of data in a large distributed system

Key problem is lookup: hash table isn't located in one place, distributed across network (DHT distributed hash table)

45

Consistent Hashing

(note: I've added the wiki definition as another flashcard)

Keys and nodes map to same ID space

Create metric space (like a ring) with nodes. Nodes have ID and key maps to ID space. Consistent hash function will assign the keys and nodes and ID in this space. Hash function used to assign identifiers. Nodes hash of IP address, keys hash of keys.

In chord, key stored at successor, which is node with next highest ID.

Consistent hashing provides:
-load balance because all nodes receive roughly same number of keys
-flexibility because when node leaves or joins network, only a small fraction of keys needs to be moved

46

3 Methods for Implementing consistent hashing

Every node knows location of every other nodes
-lookups are fast on the order of 1
-routing table must be large because every node must know location of every other node, order N (number of nodes in network)

Each node only knows location of immediate sucessor
-small table, size order 1
-requires order N lookups

Finger tables

47

Finger tables

every node knows location of N other nodes, distance of nodes that it knows increases exponentially
-finger i points to the successor of n+2i
-find predecessor for a particular ID and ask what is the successor of that ID
finger tables have mapping of predecessor too. Then ask that predecessor for its successor, moves around ring looking for node whose successor's id is bigger than id of data
-require order of long(n) hopes
-order logn messages per lookup
-size of finger table is order of log n state per node
-when nodes joins network, initialize fingers of node and update fingers of existing nodes, transfer keys from successor to new node


-when a node leaves, any particular node keeps track of fingers of successor so predecessor can reach nodes corresponding to failed/left node's fingerlings

48

Finger tables (wiki definition)

(from wiki)
To avoid the linear search above, Chord implements a faster search method by requiring each node to keep a finger table containing up to m entries, recall that m is the number of bits in the hash key.

The i^{th} entry of node n will contain:
successor((n+2^{i-1]) mod 2^m)

The first entry of finger table is actually the node's immediate successor (and therefore an extra successor field is not needed). Every time a node wants to look up a key k, it will pass the query to the closest successor or predecessor (depending on the finger table) of k in its finger table (the "largest" one on the circle whose ID is smaller than k), until a node finds out the key is stored in its immediate successor.

With such a finger table, the number of nodes that must be contacted to find a successor in an N-node network is O(\log N)

49

In a distributed hash table, how many steps (hops) are required to lookup an item if the finger table is a constant size (i.e., its size does not depend on the number of DHT nodes in the system)? Explain why that is the right answer.

A lookup will require O(N) hops in this case.

Suppose a constant size of 1, as an example. Each
node only knows how to find the next one, so it basically forms a ring topology. In the worst
case, the requested item is on the last node in the ring before getting back to the node that originated the request.

So the request has to go all the way around the ring, taking N-1 hops.

Based on similar reasoning, if a larger, constant number of nodes is in the finger table, a proportionately smaller amount of time may be required.

However, for any given constant size
finger table, as the number of nodes in the system grows, the number of hops required will still
be on the order of O(N).

50

Finger table node join

Whenever a new node joins, three invariants should be maintained (the first two ensure correctness and the last one keeps querying fast):

1 Each node's successor points to its immediate successor correctly.
2 Each key is stored in successor(k).
3 Each node's finger table should be correct.

To satisfy these invariants, a predecessor field is maintained for each node. As the successor is the first entry of the finger table, we do not need to maintain this field separately any more. The following tasks should be done for a newly joined node n:

1 Initialize node n (the predecessor and the finger table).
2 Notify other nodes to update their predecessors and finger tables.
3 The new node takes over its responsible keys from its successor

51

For a more typical DHT setup where the finger table has O(log N) entries, for N DHT nodes in the system, explain why the number of steps to access an item is O(log N).

O(log N) entries in the finger table means that each node knows about the node halfway around the ring back to it, about the node halfway to that one, the one halfway to that one, and so on until the last entry in the finger table that is just the next node.

This means that for
any given item that could be on any node, each node knows the address of at least one node that is at least half way around the ring from itself to the item.

Since each hop cuts the distance to the item in half, the number of hops required to get to the item from any starting point in
the DHT is O(log N). (This should be understood by analogy to binary search,
divide-and-conquer, etc.)

52

BIC-TCP

First, BIC-TCP is a
rather complex algorithm that approximates a cubic function. It’s growth function has both
linear and logarithmic elements, and many different phases (additive increase, binary search,
max probing). Additionally, on short RTT and low speed networks, BIC-TCP’s growth function can be too aggressive (recall it was designed to achieve high utilization on large bandwidth, long RTT networks), making it fairly unfriendly to other TCP flows competing for bandwidth.

53

Describe the operation of BIC-TCP’s binary search algorithm for setting the congestion
window.

At a high level, when BIC-TCP experiences a packet loss event, the congestion window value is
set to the midpoint between last window value that did not suffer from loss (WMAX) and the
previous window size that was loss free for at least one RTT (WMIN). This is often referred to as
a binary search, as it follows intuitively that the maximum possible stable window value is
somewhere between a value that was known to be stable and the value achieved just prior to
the loss event. This algorithm "searches" for this maximum stable window value by effectively reducing the range of possible value by half per packet loss event.

54

Once stable, how does BIC-TCP react to changes in available bandwidth, i.e. what
happens when there is a sudden increase or decrease in available bandwidth?

Once this maximum stable window size has been achieved, if there is a sudden increase in
available bandwidth, then max probing phase of BIC-TCP will rapidly increase the window beyond the value of WMAX until another loss event occurs, which resets the value of WMAX. If a sudden decrease in available bandwidth occurs, and this loss is below the value of WMAX,
then the window size is reduced by a multiplicative value (β), enabling a safe reaction to a
lower saturation point.

55

BIC-TCP binary search

At a high level, when BIC-TCP experiences a packet loss event, the congestion window value is
set to the midpoint between last window value that did not suffer from loss (WMAX) and the
previous window size that was loss free for at least one RTT (WMIN)

the maximum possible stable window value is
somewhere between a value that was known to be stable and the value achieved just prior to
the loss event

56

How does the replacement of this congestion control algorithm with a cubic growth function in CUBIC-TCP improve on BIC-TCP? Discuss.

CUBIC retains the strengths of BIC-TCP, but makes many improvements. First, BIC-TCP is a
rather complex algorithm that approximates a cubic function. It’s growth function has both
linear and logarithmic elements, and many different phases (additive increase, binary search,
max probing). Additionally, on short RTT and low speed networks, BIC-TCP’s growth function
can be too aggressive (recall it was designed to achieve high utilization on large bandwidth,
long RTT networks), making it fairly unfriendly to other TCP flows competing for bandwidth.

CUBIC replaces the growth function in BIC-TCP with a cubic growth function, based on the
elapsed time between congestion events. This function maintains the multiplicative decrease
utilized by many TCP variants, but records the window size at a congestion event as WMAX.
Using this value of WMAX , the cubic growth function can be restarted, with the plateau
occurring at WMAX. This eliminates the need for multiple growth phases and maintaining values like SMAX/MIN. The plateau of the cubic growth function retains BIC-TCP’s stability and utilization strengths.

57

CUBIC Window Growth Function

CUBIC replaces the growth function in BIC-TCP with a cubic growth function, based on the elapsed time between congestion events. This function maintains the multiplicative decrease
utilized by many TCP variants, but records the window size at a congestion event as WMAX.

Using this value of WMAX , the cubic growth function can be restarted, with the plateau occurring at WMAX. This eliminates the need for multiple growth phases and maintaining values like SMAX/MIN. The plateau of the cubic growth function retains BIC-TCP’s stability and utilization strengths.

58

CUBIC TCP-friendly region

The plateau is also known as the TCP friendly region. In this region of the growth curve, the congestion window is nearly constant as it approaches and potentially exceeds WMAX. This achieves stability, as WMAX represents the point where network utilization
is at its highest under steady state conditions.

59

CUBIC Concave Region

The concave region of CUBIC’s growth function rapidly increases the congestion window
to the previous value where a congestion event occurred, allowing for a quick recovery
and high utilization of available bandwidth following a congestion event.

60

CUBIC Convex Region

The convex region of CUBIC’s growth function exists to rapidly converge on a new value
of WMAX following a change in available bandwidth. When the congestion window
exceeds WMAX, and continues to increase throughout the end of the plateau, it likely
indicates some competing flows have terminated and more bandwidth is available. This
is considered a max probing phase, as the congestion window will grow exponentially in
this region until another congestion event occurs and WMAX is reset.

61

CUBIC Fast Convergence

When new flows start competing for bandwidth, other flows must release some bandwidth to
maintain fairness. CUBIC employs the fast convergence mechanism to accomplish this. When
two successive congestion events indicate a reduction in available bandwidth (i.e. a reduced
value of WMAX), the new value of WMAX further reduced (based on the multiplicative
decrease factor used for resetting the congestion window) to free up additional bandwidth and
reduce the number of congestion events required for all flows to converge on a fair distribution
of bandwidth.

62

CUBIC Multiplicative decrease

When packet loss occurs, CUBIC reduces its window size by a factor of Beta. If Beta < 0.5, slower convergence. Higher would be faster, but also less stable.

63

TCP Fast Open

Initial handshake to establish a connection is 1 RTT of delay, which is a significant portion of the web flows' network latency

TCP Fast open allows for exchanging data during the TCP initial handshake.

Core goal: safely exchange data. Security cookie is used by server to authenticate a client that is initiating a TFO connection.

Assumptions:
-servers cannot maintain permanent or semi-permanent per-client state (requires too much memory). Stateless server minimizes state-management complexity
-servers cannot perform any operations to support TFO that are not reasonable to implement on the kernel's critical path
-clients are willing to install new software to support TFO and small changes acceptable
-acceptable to leverage other security mechanisms within a server's domain in cnocert with TFO to provide required security guarantees

64

TCP Fast Open

Initial handshake to establish a connection is 1 RTT of delay, which is a significant portion of the web flows' network latency

TCP Fast open allows for exchanging data during the TCP initial handshake.

Core goal: safely exchange data. Security cookie is used by server to authenticate a client that is initiating a TFO connection.

Assumptions:
-servers cannot maintain permanent or semi-permanent per-client state (requires too much memory). Stateless server minimizes state-management complexity
-servers cannot perform any operations to support TFO that are not reasonable to implement on the kernel's critical path
-clients are willing to install new software to support TFO and small changes acceptable
-acceptable to leverage other security mechanisms within a server's domain in cnocert with TFO to provide required security guarantees

65

TCP Fast Open Steps to Request TFO Cookie

Steps:
1. client sends SYN packet to server with Fast Open Cookie Request TCP option

2. Server generates cookie by encrypting the client's IP address under a secret key. The server responds to the client with a SYN-ACK that includes the generated Fast Open Cookie in the TCP option field.

3. The client caches the cookie for future TFO connections to the same server IP.

66

TCP Fast Open steps to use fast open cookie

1. Client sends a SYN with the cached Fast open cookie (as a TCP option) along with application data.

2. Server validates the cookie by decrypting the cookie and comparing the IP address or by re-encrypting the IP address and comparing against the received cookie.
(a) If the cookie is valid, the server sends a SYN-ACK that acknowledges the SYN and the data. The data is delivered to the server application.
(b) Otherwise, the server drops the data, responds with a SYN-ACK that only acknowledges the SYN sequence number. the connection proceeds through a regular 3WHS.

3. If the data in the SYN packet was accepted, the server may send additional response data segments to the client before receiving the first ACK from the client.

4. The client sends an ACK acknowledging the server SYN. If the client's data was not acknowledged, it is retransmitted with the ACK.

5. The connection then proceeds like a normal TCP connection.

67

What threat do network middleboxes pose to negotiating MPTCP connections? How
does the design of MPTCP mitigate this?

Network middleboxes may strip out unrecognized TCP options (flags) used during the 3-way
handshake used to negotiate a MPTCP connection. This means that while the sender and
receiver may both be MPTCP capable with multiple viable interfaces, a middlebox along the
route may ultimately prevent a MPTCP connection.
MPTCP is designed to resort to a single path TCP when both ends of the connection cannot
support MPTCP. In this case, when the sender’s MPTCP capable flag is stripped out by a
middlebox enroute to the receiver, the receiver thinks that the sender is not MPTCP capable
and proceeds with a single path TCP connection.

68

What threat do network middleboxes pose to negotiating MPTCP connections? How
does the design of MPTCP mitigate this?

Network middleboxes may strip out unrecognized TCP options (flags) used during the 3-way handshake used to negotiate a MPTCP connection. This means that while the sender and
receiver may both be MPTCP capable with multiple viable interfaces, a middlebox along the
route may ultimately prevent a MPTCP connection.

MPTCP is designed to resort to a single path TCP when both ends of the connection cannot
support MPTCP. In this case, when the sender’s MPTCP capable flag is stripped out by a
middlebox enroute to the receiver, the receiver thinks that the sender is not MPTCP capable
and proceeds with a single path TCP connection.

69

Why are receive buffer sizes required to be larger for MPTCP enabled connections?

The receive buffer allows out of order data to continue flowing in the event a packet is dropped
and must be resent. For a standard TCP connection, the required buffer size is determined by
the bandwidth delay product of the connection.
With multiple subflows across a single connection present in MPTCP, the worst case scenario is
that a packet drop occurs early and must be re-sent across the slowest link (like a 3G mobile
connection). This would require other subflows (like high bandwidth WiFi connections) to have
larger buffers than would be required if it were the only connection, because it can send data
much faster than the slower link that is retransmitting the lost packet.

70

What controls does MPTCP put in place to maximize memory efficiency?

MPTCP has several built in functions that allow a connection to make the most of the memory it
has available. The first is opportunistic retransmission, where an idle subflow (waiting on
receive window space) may retransmit unacknowledged data sent on another slower subflow.
Additionally to prevent subflows from becoming a receive window bottleneck in the future,
subflows that induce opportunistic retransmission can be penalized by reducing their
congestion windows. This reduces the amount of traffic sent along this subflow allowing the
faster link to send more data.
Additionally, the buffer itself can be autotuned and capped by MPTCP mechanisms. Since the
buffering requirements for MPTCP are so large, MPTCP only allocates a portion of the
maximum allowable buffer size at the start of the connection, and increases this allocation as
needed throughout the lifetime of the MPTCP flow. If the flow does not require worst case
buffering, the system overall conserves memory resources. Combined with capping congestion
windows on subflows that are excessively filling buffers reduces the overall need for system
resources for MPTCP flows

71

MPTCP Goals

Goals:
-move single-path Internet to one where robustness, performance, and load-balancing benefits of multipath transport are available to all application,s the majority of which use TCP for transport.
-Unmodified application to start a TCP connection with regular API, but (if both endpoints support MPTCP and multipaths) MPTCP can set up additional subflows and stripe the connection's data across these subflows, sending the most data on the least congested path

Benefits:

72

MPTCP Goals

Goals:
1. Move single-path Internet to one where robustness, performance, and load-balancing benefits of multipath transport are available to all application,s the majority of which use TCP for transport.

Unmodified application to start a TCP connection with regular API, but (if both endpoints support MPTCP and multipaths) MPTCP can set up additional subflows and stripe the connection's data across these subflows, sending the most data on the least congested path


2. Negotiating MPTCP can cause connections to fail when regular TCP would have succeeded. MPTCP needs to work in all scenarios where TCP currently works, so if a subflow fails, connection must continue as long as anothe subflow has connectivity

3. MPTCP must be able to utilize the network at least as well as regular TCP, but without starving TCP.

4. MPTCP must be implementable in operating systems without using excessive memory or processing.

73

MPTCP Design

Simplest: take segments coming out of regular stack and stripe them across the available paths. Sender needs to know which paths perform well (measure per path RTT, remember which segments it sent on each path, use TCP selective acknowledgements to learn which segments arrived)
-flaw: on each path, MPTCP appears as discontinuous TCP bytestream, which would upset many middleboxes

Instead, MPTCP design is:
-Connection Setup: MPTCP uses new TCP options in SYN packets, and the endpoints exchange connection identifiers.

-Adding subflows: Connection identifiers are used to add new paths (subflows) to an existing connection.
New subflows must be associated with existing MPTCP flow
MPTCP must be robuts to an attacker that attemtps to add its own subflow to an existing MPTCP (uses 64-bit random keys)

-Reliable multipath delivery: Subflows resemble TCP flows on the wire, but they share a single send and receive buffer at the endpoints.
MPTCP uses per subflow sequence numbers to detect losses and drive retransmissions and connection level sequence numbers to allow reordering at the receiver
Connection-level acknowledgements are used to implement proper flow control

-Connection and subflow teardown: FIN indicates no more data on this subflow in DATA FIN, sender waits for ACK of DATA FIN on each subflow before sending FIN on each subflow

74

Five main mechanism of TCP

Connection setup handshake and state machine

Reliable transmission & acknowledgment of data

Congestion control

Flow control

Connection teardown handshake and state machine (using FIN for normal shutdown and RST for errors such as when one end no longer has state)

75

Middlebox

A middlebox is defined as any intermediary device performing functions other than the normal, standard functions of an IP router on the datagram path between a source host and destination host.” — B. Carpenter. RFC 3234. Middleboxes: Taxonomy and Issues.

Also called a “network appliance” or a “network function.”

Wiki: A middlebox or network appliance is a computer networking device that transforms, inspects, filters, or otherwise manipulates traffic for purposes other than packet forwarding.

76

Buffer sizing original rule of thumb

Rule of thumb: Router buffers sized based on 1994 paper: router needs an amount of buffering equal to average RTT of a flow that passes through the router, multipled by the capacity of the router's network interfaces. i.e. Beta = RTT x C rule.
--Good for small number of long-lived TCP flows
--Ensures buffer at bottleneck link never underflows, so router doesn't lose throughput


Key to sizing buffer is to make sure that while sender pauses, router buffer doesn't go empty and force bottleneck link to go idle (so determine rate at which buffer drains to ensure size of reservoir needed to prevent it from going empty)
--This is equal to distance in bytes between peak and trough of the TCP sawtooth

77

Reasons overbuffering is bad

Overbuffering is a bad idea because:
1. complicates the design of high-speed routers, leading to higher power consumption, more board space, and lower density
2. increases end-to-end delay presence of congestion. Large buffers conflict with low-latency needs of real time applications (video games, device controls)

78

Under what conditions was the "rule-of-thumb" for buffer size (𝐵= 𝑅𝑇𝑇̅𝑋 𝐶) originally
conceived? How does this fundamentally differ from current, real world conditions?

The "rule-of-thumb" is derived from an analysis of a single long lived TCP flow. The rate is
designed to maintain buffer occupancy during TCP congestion avoidance, preventing the
bottleneck link from going idle.
These conditions are not realistic compared to actual flows in backbone routers. For example a
2.5 Gb/s link typically carries 10,000 flows at a time, of which the life of the flow varies. Some
flows are only a few packets, and never leave TCP slow start, and hence never establish an
average sending rate.
Of the flows that are long lived, they have various RTTs and their congestion windows are not
synchronized, which contrasts directly with a single long lived flow with a stable RTT and single
congestion window.

79

Statistical modeling of desynchronized long lived flows indicates that smaller buffer sizes
are sufficient to maintain link utilization as the number of these long lived flows increases.
However, not all flows can be expected to be long lived. Discuss why short lived flows (less
than 100 packets) do not significantly detract from these findings.

Even when the vast majority of flows across a link are short lived, the flow length distribution
remains dominated by the long lived flows on the link. This means that the majority of the
packets on the link at any given time belong to long lived flows.

Required buffer size in the case of short lived flows depends on actual load on the links and the
length of the flows, not the number of flows or propagation delays. This means that roughly the
same amount of buffering required for desynchronized long lived flows will also be sufficient
for short lived flows as well.

80

Explain how standing queues develop in network buffers at bottleneck links.

Queues develop at bottleneck links as a result of the bottleneck’s reduced forwarding speed. As
some of the packets in the queue are forwarded, the TCP sender will begin to receive ACKs and
send more packets, which arrive at the bottleneck link buffer, refilling the queue. The
difference in the bottleneck link speed and the link RTT (driving the congestion window of the
TCP flow) will result in a certain number of packets consistently occupying the buffer, until the
flow completes, which is referred to as the standing queue.

81

Why is a
standing queue NOT correctly identified as congestion?

Standing queues are NOT congestion because it results from a mismatch in congestion window
and the bottleneck link size. A standing queue can develop in single flow environments, and
under usage limits that would eliminate actual congestion.

82

Discuss the drawbacks to over-buffering routers. If memory is widely available at low
cost, why is it a bad idea to use massive buffers to ensure high link utilization?

Using massive buffers in internet routers increases the size, power consumption, and design
complexity of routers. Large buffers are typically implemented in off chip DRAM, where small
buffers can be implemented on chip.
Additionally, large off chip DRAM is slower to retrieve data than on chip SRAM. This means that
retrieving buffered packets takes longer, which means the latency on the link will grow. During
periods of congestion with a large amount of buffered packets, latency sensitive applications
like live streaming and networked video games will suffer.

Further, TCP congestion control algorithms can also suffer under these conditions. Using large
amounts of cheap memory may eliminate the need to worry about proper buffer sizing, but it
induces hardware efficiency issues and presents problems for low latency applications.

83

What effect does dropping the packet have on the TCP
sender?

Dropping a flow’s packet triggers a congestion window reduction by the TCP sender, which
helps to eliminate buffer bloat.

84

Buffer size and router design

Router line card provides interfaces to the network

Router linecards use multiple DRAM chips in parallel to obtain the aggregate data-rate needed
-but wide buses consume large amounts of board space and fast data pins on modern DRAMs consume a lot of power
-so currently state of the art packet buffers run at an aggregate rate around 40GB/s

85

Buffer size for a single long-lived TCP flow

Initially, sender increases its window-size and fills the buffer until the buffer has to drop the first packet

Just under one RTT later, sender times out because waiting for ACK for dropped packet
-Halves window size from Wmax to Wmax/2
-Sender pauses while waiting for ACKS for those Wmax/2 packets. Packets arrive at sender at rate C, so pauses for (Wmax/2)/C seconds

Then want the buffer occupancy to almost hit zero once per packet loss, but never stay empty. So that leads to the rule of thumb Beta = RTT x C

Underbuffered where buffer size is less than RTT x C, so when window is halved and sender pauses waiting for ACKs, there is insufficient reserve in the buffer to keep the bottleneck link busy. Buffer so empty, bottleneck link goes idle, lose throughput.

Overbuffered where, when window halved, buffer does not go nearly/completely empty, queuing delay of the flows increased by a cosntant

86

Synchronized Long Flows Buffer Size

If TCP flows share the same bottleneck link, they become synchronized because they experience packet drops at roughly the same time, so sawtooths become synchronized and in-phase

So buffer needs to be the same as single flow

87

Desynchronized long flows buffer size

The more flows added that are not synchronized, the more they will smooth each other out, the less they look like a sawtooth, and the distance from the peak to the trough of the aggregate window size will get smaller

So buffer size requirements smaller as we increase the number of flows

88

Sizing the router buffer for short flows

Short flows (TCP and non-TCP) have much smaller effect than long-lived TCP flows, particularly in a router with a large number of flows

Average queue length is independent of:
-number of flows
-bandwidth of the link

Average queue length only depends on:
-load of the link
-length of the flows

Average queue length peaks when the probability of large bursts is highest, not necessarily when the average burst size is highest.

***Key observation: for short flows, the size of the buffer does not depend on:
- the line-rate
- the propagation delay of the flows
-the number of the flows

***For short flows, the size of the buffer only depends on:
- the load of the link
- the length of the flows

Therefore, backbone router serving highly aggregated traffic needs the same amount of buffering to absorb short-lived flows as a router serving only a few clients

Short-lived flows only require small buffers

When there is a mix of short-lived and long-lived flows, short-lived flows contribute very little to buffering requirements

Buffer size usually determined by the number of long-lived flows

89

Buffer size: Definition of short-lived flow

TCP flow that never leaves slow-start (e.g. any flow with fewer than 90 packets, assuming a typical maximum window size of 65kB)

90

Backbone definition

A backbone is a part of computer network that interconnects various pieces of network, providing a path for the exchange of information between different LANs or subnetworks. A backbone can tie together diverse networks in the same building, in different buildings in a campus environment, or over wide areas.

A backbone router is a type of router that links separate systems in different meshes of a network with each other. As its name suggests, a backbone router plays the role of a backbone in any network connection and, as such, is part of the backbone network.

91

Consider the CoDel active queue management algorithm. How does the algorithm decide
whether or not drop a flow’s packets?

CoDel assumes that a standing queue of the target size is acceptable, and that at least one
maximum transmission unit (MTU) worth of data must be in the buffer before preventing
packets from entering the queue (by dropping them). CoDel monitors the minimum queue
delay experienced by allowed packets as they traverse the queue (by adding a timestamp upon
arrival).
If this metric exceeds the target value for at least one set interval, then packets are dropped
according to a control law until the queue delay is reduced below the target, or the data in the
buffer drops below one MTU.

92

CoDel

In network routing, CoDel (pronounced "coddle") for controlled delay is a scheduling algorithm for the network scheduler developed by Van Jacobson and Kathleen Nichols.[1][2] It is designed to overcome bufferbloat in network links (such as routers) by setting limits on the delay network packets suffer due to passing through the buffer being managed by CoDel.

CoDel is parameterless. One of the weaknesses in the RED algorithm (according to Jacobson) is that it is too difficult to configure (and too difficult to configure correctly, especially in an environment with dynamic link rates). CoDel has no parameters to set at all.
CoDel treats good queue and bad queue differently. A good queue has low delays by nature, so the management algorithm can ignore it, while a bad queue is susceptible to management intervention in the form of dropping packets.
CoDel works off of a parameter that is determined completely locally, so it is independent of round-trip delays, link rates, traffic loads and other factors that cannot be controlled or predicted by the local buffer.
The local minimum delay can only be determined when a packet leaves the buffer, so no extra delay is needed to run the queue to collect statistics to manage the queue.
CoDel adapts to dynamically changing link rates with no negative impact on utilization.
CoDel can be implemented relatively simply and therefore can span the spectrum from low-end home routers to high-end routing solutions.

93

Active Queue Management (AQM)​ techniques:​ ​Random
Early Detection (RED)​ and​ ​CoDel​. Although they vary in specifics, these two algorithms share
a common basic approach to solving the buffer bloat problem. Explain what that approach is
and why it works.

Their approach is to drop packets even when their buffers are not full. RED determines whether
to drop a packet statistically based off how close to full the buffer is, whereas CoDel calculates
the queuing delay of packets that it forwards and drops packets if the queuing delay is too long.
By dropping packets early, senders are made to reduce their sending rates at the first signs of
congestion problems, rather than waiting for buffers to fill.

94

Random early detection (RED) Algorithm for queue sizing


RED monitors the average queue size and drops (or marks when used in conjunction with ECN) packets based on statistical probabilities. If the buffer is almost empty, then all incoming packets are accepted. As the queue grows, the probability for dropping an incoming packet grows too. When the buffer is full, the probability has reached 1 and all incoming packets are dropped.

RED is more fair than tail drop, in the sense that it does not possess a bias against bursty traffic that uses only a small portion of the bandwidth. The more a host transmits, the more likely it is that its packets are dropped as the probability of a host's packet being dropped is proportional to the amount of data it has in a queue. Early detection helps avoid TCP global synchronization.

Pure RED does not accommodate quality of service (QoS) differentiation. Weighted RED (WRED) and RED with In and Out (RIO)[4] provide early detection with QoS considerations.

95

What are the two Active Queue Management (AQM)​ techniques

Random
Early Detection (RED)​ and​ ​CoDel​. Although they vary in specifics, these two algorithms share
a common basic approach to solving the buffer bloat problem.

96

Explain what the RED approach to AQM is and why it works.

RED determines whether
to drop a packet statistically based off how close to full the buffer is. By dropping packets early, senders are made to reduce their sending rates at the first signs of
congestion problems, rather than waiting for buffers to fill.

97

Explain what the CoDel approach to AQM is and why it works.

CoDel calculates
the queuing delay of packets that it forwards and drops packets if the queuing delay is too long.

By dropping packets early, senders are made to reduce their sending rates at the first signs of
congestion problems, rather than waiting for buffers to fill.

98

CoDel: major innovations

1. Not based on queue size, queue-size averages, queue-size thresholds, rate measurements, link utilization, drop rate, or queue occupancy time.
-Local minimum queue is more accurate and robust measurement of standing queue

2. It is sufficient to keep a single-state variable of how long the minimum has been above/below the target value for standing queue delay rather than keeping a window of values to compute the minimum

3. rather than measuring the queue size in bytes or packets, use the packet-sojourn time through the queue.

99

Consistent Hashing (from wiki)

Consistent hashing is based on mapping each object to a point on the edge of a circle (or equivalently, mapping each object to a real angle). The system maps each available machine (or other storage bucket) to many pseudo-randomly distributed points on the edge of the same circle.

To find where an object should be placed, the system finds the location of that object's key on the edge of the circle; then walks around the circle until falling into the first bucket it encounters (or equivalently, the first available bucket with a higher angle). The result is that each bucket contains all the resources located between each one of its points and the previous points that belong to other buckets.

If a bucket becomes unavailable (for example because the computer it resides on is not reachable), then the points it maps to will be removed. Requests for resources that would have mapped to each of those points now map to the next highest points. Since each bucket is associated with many pseudo-randomly distributed points, the resources that were held by that bucket will now map to many different buckets. The items that mapped to the lost bucket must be redistributed among the remaining ones, but values mapping to other buckets will still do so and do not need to be moved.

A similar process occurs when a bucket is added. By adding new bucket points, we make any resources between those and the points corresponding to the next smaller angles map to the new bucket. These resources will no longer be associated with the previous buckets, and any value previously stored there will not be found by the selection method described above.

The portion of the keys associated with each bucket can be altered by altering the number of angles that bucket maps to.