Chapter 7: Web Caching & Consistent Hashing Flashcards

Question 1

Q

Congested networks & over‐loaded servers NN

Answer

A

Web‐based processing is prone to unpredictable delays and (frequent) failures
Two main reasons: Congested networks and over‐ loaded servers
Date moves slowly through congested networks
Over‐loaded servers either refuse to serve requests or serve them slowly
Problems can occur unexpectedly and without prior

notice (e.g., Slashdot effect, flash crowd, DDOS, etc.) inncdiahly popularity

In this case, planning ahead is of limited use
Better, adapt to changing circumstances

Question 2

Q

(Web) Caching to the rescue

Answer

A

Caching employed to improve efficiency & reliability of data delivery (Web, Internet, disk, CPU, etc.)
Cache serves a (cached) page quickly even if the originating server is swamped or path to it is congested
“Cache hierarchy“, with caches
- at client (Web browser)
- at client’s site (institutional proxy)
- at server’s site (reverse proxy)
- at ISPs (CDN endpoints of major players, proxy)
- part of service delivery, 3rd party service (content delivery network)
Wide use of caches engenders a general good: If requests are served by nearby caches, then fewer go to source server, reducing load and network traffic for the benefit of all users

Question 3

Q

“Web cache hierarchy”

Question 4

Q

Web caching - what if cache crashes?

Answer

A

flood of requests

Question 5

Q

Single, shared caching machine per group of users

Answer

A

Drawbacks of single cache machine
- All users are impacted in case of cache failure –> increased latency
- Also, potential for severe impact on source server – Cache a potential bottleneck, if heavily used
Two important limits regarding hit rate of a single cache
- “False misses” for repeated requests of objects which were evicted for lack of space
- User limit of cache works against aggregating requests from as many users as possible for caching:
*The more user requests are aggregated, the better the hit rate becomes due to locality, the higher impact of failure**

Question 6

Q

Web proxy, proxy server

Question 7

Q

Reverse proxy, Web accelerator

Answer

A

closer to backend
generic logic can be pre-executed (e.g. landing page)

Question 8

Q

Content‐delivery network (CDN)

Answer

A

Place content as close as possible to user to reduce latency. Challenge is how to devise content

Question 9

Q

Web caching trade‐offs

Answer

A

Client‐server architecture is inherently non‐scalable
Proxies serve as a level of indirection
Proxies reduce client response time
- Direct and indirect effect
- Less load on the server: Server does not have to be over‐provisioned
Proxies reduce network bandwidth usage
- Wide area vs. local area use
- These two objectives are often in conflict
  - May do exhaustive local search to avoid using wide area bandwidth (e.g., in cooperative caching)
  - Prefetching uses extra bandwidth to reduce client response time

Question 10

Q

Problem: Mapping objects to caches

Answer

A

Given a number of caches (nodes)
Each node should carry an equal share of objects
Clients need to know what node to query for a given object
Nodes should be able to come and go without disrupting the whole operation (i.e., all nodes)

Question 11

Q

Attempted solution: Use hashing (mapping objects to cache)

Answer

A

Map page referenced by URL, u, into one of the nodes
Use a hash function that maps u to node h(u)
- For example, h(x) = (ax + b) mod p, where p is range of h(x)
- Interpret u as a number based on bit pattern of string underlying input URL
Hash functions tend to distribute their input uniformly “random” across the range of the hash function
- URLs are equally balanced across nodes
- No one cache responsible for an uneven share of URLs
- A disproportionately loaded node would be a bottleneck

Question 12

Q

h(u) = (7u + 4) mod 5 (Problem when failing node?)

Answer

A

Removing a node with normal hashing changes location of every node

Question 13

Q

h(u) = (7u + 4) mod 5 (Problem when adding node?)

Answer

A

Adding a node changes the location of every node

Question 14

Q

Consistent hashing

Answer

A

Goals
- Uniform distribution of objects across nodes
- Easily find objects
- Let any client perform a local computation mapping a URL to the node that contains referenced object
- Allow for nodes to be added/removed without much disruption
D. Karger et al., MIT, 1997
Basis for Akamai
- CDN company (content distribution network)
- Web cache as a service

Question 15

Q

Consistent hashing (how does it work?)

Answer

A

Select a base hash function that maps strings to the number range [0, …, M]
Divide by M, re‐mapping strings to [0, 1]
Interpret this interval as the unit circle: Here, circle with circumference 1 (normally radius 1)
Each URL is mapped to a point on the unit circle
Also, map each cache to a point on the unit circle
Assign each URL to the closest cache point in clockwise direction on the circle

Question 16

Q

Consistent hashing: Removing a cache

Question 17

Q

Consistent hashing: Add a cache

Question 18

Q

Adding an object : Consistent hashing

Question 19

Q

Implementation at each cache

Answer

A

Store cache points in a binary tree
Find clockwise successor of a URL point by single search in tree (takes O(log n) time)
For a constant time technique, cf. Karger et al., 1997

Question 20

Q

Cassandra global read‐path

Question 21

Q

A “good” base hash function is MD5

Answer

A

Message Digest 5 (MD5), R. Rivest, 1992 (MD1, …, MD6)
Cryptographic hash function that produces a 128‐bit (16‐ byte) hash value
Maps variable‐length message into a fixed‐length output
Also used to check data integrity
MD5 hash is typically expressed as a hex number (32 digits)
It’s been shown that MD5 is not collision resistant
US‐CERT about MD5 “should be considered cryptographically broken and unsuitable for further use“ (for security, not for caching)
SHA‐2 is a more appropriate cryptographic hash function

Brainscape's Knowledge GenomeTM

Chapter 7: Web Caching & Consistent Hashing Flashcards

Brainscape's Knowledge Genome^TM