Chapter 7: Web Caching & Consistent Hashing Flashcards

1
Q

Congested networks & over‐loaded servers NN

A
  • Web‐based processing is prone to unpredictable delays and (frequent) failures
  • Two main reasons: Congested networks and over‐ loaded servers
  • Date moves slowly through congested networks
  • Over‐loaded servers either refuse to serve requests or serve them slowly
  • Problems can occur unexpectedly and without prior

notice (e.g., Slashdot effect, flash crowd, DDOS, etc.) inncdiahly popularity

  • In this case, planning ahead is of limited use
  • Better, adapt to changing circumstances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(Web) Caching to the rescue

A
  • Caching employed to improve efficiency & reliability of data delivery (Web, Internet, disk, CPU, etc.)
  • Cache serves a (cached) page quickly even if the originating server is swamped or path to it is congested
  • “Cache hierarchy“, with caches
    • at client (Web browser)
    • at client’s site (institutional proxy)
    • at server’s site (reverse proxy)
    • at ISPs (CDN endpoints of major players, proxy)
    • part of service delivery, 3rd party service (content delivery network)
  • Wide use of caches engenders a general good: If requests are served by nearby caches, then fewer go to source server, reducing load and network traffic for the benefit of all users
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

“Web cache hierarchy”

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Web caching - what if cache crashes?

A

flood of requests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Single, shared caching machine per group of users

A
  • Drawbacks of single cache machine
    • All users are impacted in case of cache failure –> increased latency
    • Also, potential for severe impact on source server – Cache a potential bottleneck, if heavily used
  • Two important limits regarding hit rate of a single cache
    • “False misses” for repeated requests of objects which were evicted for lack of space
    • User limit of cache works against aggregating requests from as many users as possible for caching:
  • *The more user requests are aggregated, the better the hit rate becomes due to locality, the higher impact of failure**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Web proxy, proxy server

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reverse proxy, Web accelerator

A
  • closer to backend
  • generic logic can be pre-executed (e.g. landing page)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Content‐delivery network (CDN)

A

Place content as close as possible to user to reduce latency. Challenge is how to devise content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Web caching trade‐offs

A
  • Client‐server architecture is inherently non‐scalable
  • Proxies serve as a level of indirection
  • Proxies reduce client response time
    • Direct and indirect effect
    • Less load on the server: Server does not have to be over‐provisioned
  • Proxies reduce network bandwidth usage
    • Wide area vs. local area use
    • These two objectives are often in conflict
      • May do exhaustive local search to avoid using wide area bandwidth (e.g., in cooperative caching)
      • Prefetching uses extra bandwidth to reduce client response time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Problem: Mapping objects to caches

A
  • Given a number of caches (nodes)
  • Each node should carry an equal share of objects
  • Clients need to know what node to query for a given object
  • Nodes should be able to come and go without disrupting the whole operation (i.e., all nodes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Attempted solution: Use hashing (mapping objects to cache)

A
  • Map page referenced by URL, u, into one of the nodes
  • Use a hash function that maps u to node h(u)
    • For example, h(x) = (ax + b) mod p, where p is range of h(x)
    • Interpret u as a number based on bit pattern of string underlying input URL
  • Hash functions tend to distribute their input uniformly “random” across the range of the hash function
    • URLs are equally balanced across nodes
    • No one cache responsible for an uneven share of URLs
    • A disproportionately loaded node would be a bottleneck
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

h(u) = (7u + 4) mod 5 (Problem when failing node?)

A

Removing a node with normal hashing changes location of every node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

h(u) = (7u + 4) mod 5 (Problem when adding node?)

A

Adding a node changes the location of every node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Consistent hashing

A
  • Goals
    • Uniform distribution of objects across nodes
    • Easily find objects
    • Let any client perform a local computation mapping a URL to the node that contains referenced object
    • Allow for nodes to be added/removed without much disruption
  • D. Karger et al., MIT, 1997
  • Basis for Akamai
    • CDN company (content distribution network)
    • Web cache as a service
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Consistent hashing (how does it work?)

A
  • Select a base hash function that maps strings to the number range [0, …, M]
  • Divide by M, re‐mapping strings to [0, 1]
  • Interpret this interval as the unit circle: Here, circle with circumference 1 (normally radius 1)
  • Each URL is mapped to a point on the unit circle
  • Also, map each cache to a point on the unit circle
  • Assign each URL to the closest cache point in clockwise direction on the circle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Consistent hashing: Removing a cache

A
17
Q

Consistent hashing: Add a cache

A
18
Q

Adding an object : Consistent hashing

A
19
Q

Implementation at each cache

A
  • Store cache points in a binary tree
  • Find clockwise successor of a URL point by single search in tree (takes O(log n) time)
  • For a constant time technique, cf. Karger et al., 1997
20
Q

Cassandra global read‐path

A
21
Q

A “good” base hash function is MD5

A
  • Message Digest 5 (MD5), R. Rivest, 1992 (MD1, …, MD6)
  • Cryptographic hash function that produces a 128‐bit (16‐ byte) hash value
  • Maps variable‐length message into a fixed‐length output
  • Also used to check data integrity
  • MD5 hash is typically expressed as a hex number (32 digits)
  • It’s been shown that MD5 is not collision resistant
  • US‐CERT about MD5 “should be considered cryptographically broken and unsuitable for further use“ (for security, not for caching)
  • SHA‐2 is a more appropriate cryptographic hash function