Test 2 L8 Flashcards Preview

cs6250 > Test 2 L8 > Flashcards

Flashcards in Test 2 L8 Deck (25)
Loading flashcards...

The Web and Caching

Protocol: Hypertext Transfer Protocol (HTTP)
-application layer protocol to transfer web content
-protocol browser uses to request webpages
-protocol to return objects to browser
-layered on top of byte stream protocol like TP

Server is stateless


HTTP Requests

Request line
-indicates method of request (GET, POST sends data to server, HEAD returns headers of get request)
-includes URL
-includes version number

Optional headers
-referrer - what caused page to be requested
-user agent -client software


HTTP Response

Status line
-HTTP version
-response code indicating outcomes (100s information, 200s success, 300s redirect, 400s errors, 500s server errors)
-Content encoding
-Content length


Early HTTP


One request/response per TCP Connection

Advantage: simple to implement

Disadvantage: TCP connection for every request -overhead, slowing transfer,
-TCP three way handshake for every request
-TCP must start in slow start every time connection opens
-servers have many connections that are forced to keep TCP connections in time-wait states until timers expire, so reserve additional resources even after connections completed

Solution to increase efficiency: persistent connections


Persistent Connections

Multiple HTTP requests/responses are multiplexed on a single TCP connection
--Delimiters indicate end of requests
-Content length allows receive to identify length of response
----So server must know size of transfer in advance



Client sends the next request as soon as it encounters a referenced object

So as little as one RTT for all referenced objects before they begin to be fetched

Persistent connections with pipelining was default in HTTP 1.1



Improves performance
-Origin web server hosts content but is far away
-TCP throughput is inversely proportional to RTT
-So further away that web content is, the slower the web page will load because latency is bigger and throughput is lower
-Solution: fetch content from nearby location that has been cached
-Also benefit when multiple clients requesting same content because local clients see less latency/more throughput and ISP doesn't have to pay to keep transferring content over expensive links

Can occur in multiple places
-Browser locally
-Network (ISP, content distribution networks)

Caches expire to ensure client see the most recent version of a page
- based on the expire setter
-cache can also check with origin server to check whether original content modified (304 not modified response)

Ways to redirect clients to caches:
-browser configuration to explicitly configure browser to point to local cache so all HTTP request first directed to local cache before request is forwarded to origin
-Origin server or service hosting the content direct browser to cache
--Done by special reply to DNS request to ping potential cache servers


TCP Throughput is inversely proportional to...



Locations of caches

Can occur in multiple places
-Browser locally
-Network (ISP, content distribution networks)


Ways to redirect clients to caches

Ways to redirect clients to caches:
-browser configuration to explicitly configure browser to point to local cache so all HTTP request first directed to local cache before request is forwarded to origin
-Origin server or service hosting the content direct browser to cache
--Done by special reply to DNS request to ping potential cache servers


Benefits of caching

Reduced transit costs for ISP

Improved performance for clients


Web Content Distribution Networks (CDN) defined

Overlay network of web caches designed to deliver content to a client from optimal location
-Optimal can be geographically closest or something else.

CDNs often have geographically disparate servers

Purpose to place caches as close to users as possible

Owned by content providers (google), networks (At&t) or independently (Akami)

Nonnetwork CDNs typically place servers in other autonomous systems or ISPs

The number of cache nodes vary.


How server selection works in CDN

Which server criteria
-least loaded server
-lowest latency (most typical)
-any alive server


Content Routing: How clients get redirected to the right server in CDN

How to direct clients
1-Routing systems (e.g. anycast): number all replicas with same IP address then allow routing to take them to closes replica
--simple but servers given very little control since at whims of Internet routing (simple but coarse)

2-Application-based (e.g. HTTP redirect): requires client to first go to origin server to get redirect, increasing latency (simple but delays)

3-Naming system (e.g. DNS): client looks up domain and response contains IP address of nearby cache. Significant flexibility in directing different clients to different server replicas (fine-grained control and fast)


CDNs ++ ISPs

CDNs and ISPs have symbiotic relationship when it comes to peering relationships

CDNs like to peer with ISP because peering directing with ISP where customer located because
-better throughput since no intermediate AS hops and network latency lower
-redundancy: more vectors to deliver content increases reliability
-burstiness: during large request events, having connectivity to multiple networks where content is hosted allows ISP to spread traffic across multiple transit links thereby reducing 95th percentile and lowering transit costs

ISPs like peering with CDNs (or host caches locally) because:
-providing content closer to ISP customers allows ISP to provide good performance for customers
-lower transit costs because traffic isn't traversing costly links


Bit Torrent

Peer-to-Peer CDN used for file sharing and distribution of large files

Fetch content from peers rather than origin
-reduce congestion and -
-prevent overload at network where content hosted

Break original file into many pieces, replicate different pieces on different peers as soon as possible so each peer assembles file by picking up different pieces and get remaining pieces from other peers. Eventually assemble entire files


Bit Torrent Publishing

1. Peer creates torrent
-tracker metadata
-pieces of file

2. Seeders create initial copy (has complete copy)
-tracker has metadata about file including list of seeders that contain initial copy of file

3. client starts to download pieces of file from seeder.
-hopefully different from other peers

4. Clients swap chunks of the file and after enough swapping, they should all have complete files.

Leechers: clients with incomplete copies of the file.

Trackers: allows peers to find each other and returns random list of peers leechers can use to swap parts of the file

Freeloading: client leaves network as soon as finished downloading file. Solved by bit torrent


Solution to freeriding

Bit torrent solved freeriding (clients leaving network as soon as finished downloading file) with choking

Choking (tit-for-tat) temporary refusal to upload chunks to another peer
-if peer can't download from peer, don't upload to it
(eliminated freerider problem)


Bit torrent getting chunks to swap

If all client receive same chunks, no one has complet copy and clients won't swap

Rarest piece first: client determines which pieces are rarest and download those first
-so most common pieces left to the end and larger variety of pieces downloaded from seeder
-to begin, randomly download because rare piece not a good idea initially
-redundant requests deleted when piece received so single peer with slow transfer rate doesn't slow down the network


Distributed Hash Tables

Enable a form of content overlay called a structured overlay



Distributed hash table

Scalable, distributed lookup service (maps keys to values)

Provable correctness
Reasonably good performance

Main motivation is scalable location of data in a large distributed system

Key problem is lookup: hash table isn't located in one place, distributed across network (DHT distributed hash table)


Consistent Hashing

keys and nodes map to same ID space

Create metric space (like a ring) with nodes. Nodes have ID and key maps to ID space. Consistent hash function will assign the keys and nodes and ID in this space. Hash function used to assign identifiers. Nodes hash of IP address, keys hash of keys.

In chord, key stored at successor, which is node with next highest ID.

Consistent hashing provides:
-load balance because all nodes receive roughly same number of keys
-flexibility because when node leaves or joins network, only a small fraction of keys needs to be moved


Implementing consistent hashing

Every node knows location of every other nodes
-lookups are fast on the order of 1
-routing table must be large because every node must know location of every other node, order N (number of nodes in network)

Each node only knows location of immediate sucessor
-small table, size order 1
-requires order N lookups

Finger tables: every node knows location of N other nodes, distance of nodes that it knows increases exponentially
-finger i points to the successor of n+2i
-find predecessor for a particular ID and ask what is the successor of that ID
finger tables have mapping of predecessor too. Then ask that predecessor for its successor, moves around ring looking for node whose successor's id is bigger than id of data
-require order of long(n) hopes
-order logn messages per lookup
-size of finger table is order of log n state per node
-when nodes joins network, initialize fingers of node and update fingers of existing nodes, transfer keys from successor to new node

-when a node leaves, any particular node keeps track of fingers of successor so predecessor can reach nodes corresponding to failed/left node's fingerlings


Finger tables

(from wiki)
To avoid the linear search above, Chord implements a faster search method by requiring each node to keep a finger table containing up to m entries, recall that m is the number of bits in the hash key.

The i^{th} entry of node n will contain:
successor((n+2^{i-1]) mod 2^m)

The first entry of finger table is actually the node's immediate successor (and therefore an extra successor field is not needed). Every time a node wants to look up a key k, it will pass the query to the closest successor or predecessor (depending on the finger table) of k in its finger table (the "largest" one on the circle whose ID is smaller than k), until a node finds out the key is stored in its immediate successor.

With such a finger table, the number of nodes that must be contacted to find a successor in an N-node network is O(\log N)


Finger table node join

Whenever a new node joins, three invariants should be maintained (the first two ensure correctness and the last one keeps querying fast):

1 Each node's successor points to its immediate successor correctly.
2 Each key is stored in successor(k).
3 Each node's finger table should be correct.

To satisfy these invariants, a predecessor field is maintained for each node. As the successor is the first entry of the finger table, we do not need to maintain this field separately any more. The following tasks should be done for a newly joined node n:

1 Initialize node n (the predecessor and the finger table).
2 Notify other nodes to update their predecessors and finger tables.
3 The new node takes over its responsible keys from its successor